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A WORD TO THE TEACHER 


This book is the result of fifteen years of practical experience in 
teaching statistics to more than 3,000 college students ranging 
from the sophomore to the graduate level. It was developed and 
taught by the laboratory method. Much of the material is pitched 
to the sophomore level. Some of it is more difiicult. After teach- 
ing the subject for thirty semesters the author is convinced that a 
large body of fairly difficult statistical theory and methods can 
be taught to sophomores and juniors if it is presented in a practical 
non-technical way. He is also convinced that statistics should 
first be taught to most students from the standpoint of preparing 
them to be consumers of statistics instead of creators of researches. 
Those who learn first to consume, to understand, statistics which 
others have prepared will if interested and as the need arises begin 
to do creative work. 

This text is set up in five parts to meet the teachers^ needs for 
subject matter on various levels of difficulty adapted to students 
of successive levels of age and attainment. The following combi- 
nations of the five parts are suggested for the various levels. 

1. For Junior College students and for Sophomore classes in 
Senior colleges, use only Parts I and II, the first 14 chapters, for 
a one-semester course. 

2. For Normal Schools and Teachers’ Colleges, use Parts I 
and II and portions of Chapters 20, 21, and 24, for a one-semester 
course. 

3. For Schools of Commerce and classes in Economics and 
Business Administration, use Parts I, II, and III and portions of 
Chapters 20, 23, and 24, for a one-semester course. 
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4. For Schools of Agriculture, use Parts I, II, and IV and parts 
of Chapters 16 to 19, 23, and 24, for a* one-semester course. 

5. For courses in Sociology, use Parts I and II and Chapters 
21 and 22 and parts of Chapters 23 and 24, for a one-semester 
course. 

6. For a two-semester course, use Parts I, II, and III for the 
first semester and Parts IV and V with additional problems over 
the more difficult portions of Parts II and III for the second 
semester. 

7. For a one-semester course for advanced or graduate students 
who have had one or more semesters of statistics already, use 
Parts II, III, IV and V as the core outline of the course supple- 
mented with outside readings from the Selected References at 
the ends of the chapters. Also use more advanced and difficult 
materials for laboratory problems. 

Because of the gradual and progressive difficulty of the material 
of this text from Part I through Part V, it wUl be found to 
serve well as a coherent and progressive outline for a series of 
lectures on statistics which might add additional materials from 
other texts or from original sources. It may well serve as the core 
material for many types of courses which are varied in detail to 
meet special requirements. It is not thought that any teacher 
will expect to use all the material in this text in one semester. 
Rather will he select from the several Parts those particular and 
limited materials for which his course is designed. It is believed 
that by this adaptive, selective method much better results will 
be obtaiued. The wise teacher will always fit his subject matter 
to his students. 

teacher’s use of exercises and problems 

The teacher will notice that no specific and detailed instructions 
are given in most cases as to the use of the exercises and data at 
the end of each chapter. The author has two reasons for this 
omission. 

First, most of the data included are suitable for many uses and 
for the computation of a large number of statistical measures. To 
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specify a few specific problems in each case would be to limit the 
full possibilities of the use of the data. To specify all profitable 
computations would be to lengthen unduly the assignments and 
to make them seem confusing to the students. 

Second, different teachers in different classes of varying degrees 
of maturity and with varying objectives will wish to emphasize 
different phases of statistical methods with their assigned problems. 
To achieve the best results each teacher will wish to determine and 
assign his own problems for the particular purpose in mind. This 
practice vitalizes the work. It also makes possible the use of the 
same data for many semesters without routine and monotonous 
duplications. Such a method makes the teacher, the pupil, and 
the text partners in a vital problem for a specific objective. It 
gives teaching and study a sense of greater zest and practical 
purpose. 


REFERENCE BOOK 

The businessman and practicing statistician will also find this 
text a valuable reference book. The use of complete worksheets 
and the detailed solution of problems not only makes the learning 
process quicker and easier for the student but it enables the 
businessman in his office and the research worker to have a model- 
io-go-hy in organizing his statistical work. The large number of- 
reference tables enables the businessman, the student, and the re- 
search worker to find the measure of error, the required formula 
or the specific information desired with the least time and effort. 
Business executives will find the materials in the book easy to 
locate and use. It will be a convenient Handbook of Statistics. 

M. M. B. 

Stillwater, Oklahoma 
March 17, 1944 




A WORD TO THE STUDENT 


Statistics is considered a dry subject by many college students. 
It often is. The purpose of this book is to supply an adequate 
quantity of moisture in the form of simple, well-organized material 
presented with a minimum of mathematics and a strong emphasis 
on practical, commonplace, everyday application to the problems 
of life. 

More and more, statistics is taught in most colleges from the 
standpoint of consumption rather than with a view to creation. 
Not one student in ten, perhaps not one in fifty, will ever become 
a productive statistician or a research worker, or even a statistical 
clerk. But our civilization has reached the point where it is 
impossible to read intelligently the baseball scores, the stock 
market reports, the weather report, the business section of the 
•daily paper, trade magazines, popular books on geography, travel, 
economics, sociology, agriculture, or production without an ele- 
mentary knowledge of statistics. The reason for this condition is 
that^tatistics are a kind of shorthand, a time-saving device in a 
busy world to say a great deal in a small space and in a short tim^ 

Today much advertising is done on the printed page, poster, 
and over the radio in the form of statistics. Banks, labor unions, 
government agencies, insurance companies, railroads, retail stores, 
military organizations, and factories employ vast displays of sta- 
tistical material for the information of their employees, their 
owners, and for the general public. The person who has no 
knowledge of these simple uses of statistics is definitely handicapped 
in the modern world. This book is presented to you as a quick 
and easy way of acquiring this necessary information. Its subject 
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matter cannot be learned without study, but with an ordinary 
amount of conscientious effort the average student can acquire 
this knowledge in a single semester. It is well worth your effort. 
You will have daily use for many of these methods. You will 
need them in your business. 
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Part One 
General Methods 


CHAPTER 1 

THE MEANING AND USE 
OF STATISTICS 


Statistics is the science and method of analyzing groups of re- 
lated numbers in order to discover their relationships and mean- 
ings. A single number does not lend itself to statistical analysis. 
The wage of one workman for a single month is too limited a 
quantity for comparative study. There must be at least two num- 
bers before comparisons can be made or relationships discovered. 
The monthly wage of a workman for two or more months, or the 
wages of two or more workmen, is a fit subject for study. Not a 
datum, but data are the subject matter of statistics. 

In common usage the term statistics'^ is employed to signify 
any group of figures or data. A common statement is that the 
lecturer quoted many statistics or that the book is dry and un- 
interesting because it is full of statistics. Such usage makes 
'^statistics’^ equivalent to "numbers.” This is a limited, in- 
adequate, and, in a sense, a wrong use of the term. The word 
"data” is the proper term with which to designate raw, or un- 
analyzed, numbers. In the true sense, statistics appear only when 
the data have been organized and analyzed and the relationships 
existing among them have been stated in summary or relative 
terms. 


1 
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THE MEANING AND USE OF STATISTICS 


Statistical methods are ways of thinking that are almost uni- 
versal. Millions of persons who have never heard of statistical 
theory follow its basic principles every day in making the common 
necessary decisions of life. The housewife in shopping for groceries 
reads the various food advertisements and inquires as to prices 
of various articles at several stores until she arrives at a workable 
idea of the level, average, and range of prices. In her daily efforts 
to get the most of quality and quantity for her money, she is, 
perhaps, without knowing it, using sound and adequate statistical 
methods. The new boy at the party as he dances, talks, and 
laughs with the various girls quickly forms an opinion as to their 
grace, ease, culture, charm, and sophistication, and unconsciously 
compares them with the girls back home or in other communities 
as higher or lower, better or worse. If the young man had ob- 
tained his information concerning these girls from a long formal 
questionnaire which gave in quantitative measurements the in- 
come and occupation of their parents, the kind of car they drove, 
their years in school, their sports, hobbies, opinions, and social 
and religious associates, and a hundred other points, all of which 
had been summed, averaged, correlated, and stated in terms of 
ratios, indexes, coefficients, and other formal statistics, he would 
have had a fuller and more accurate measure of this group of girls, 
but the main difference would have been that the questionnaire 
formal method was only more mathematical and detailed, though 
not essentially different in principle from his method at the party. 
In all our social relationships we are compelled to make decisions 
about associates and friends on the basis of samples. When boys 
and girls complete high school or college, stores, mills, and fac- 
tories employ them on the basis of the data in their transcripts 
and recommendations. A young man and a young woman on the 
sample of a brief conversation and chance acquaintance may 
make a date, and later, on the inadequate sample of a few months^ 
courtship, assume the heavy responsibilities of married life. Be- 
cause life is so short that we cannot know all about anything, 
we have to make our decisions on the basis of sample information. 
This is the statistical method. Since no sample can be perfect, 
the difference between our methods of making our daily decisions 
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and the most refined statistical methods is one of degree and not 
of kind. We literally live by statistics. 

Our common expressions in the home and shop and on the street 
reveal the statistical basis of our daily thought. “You can^t make 
a silk purse out of a sow^s ear^^ is a homely way of saying that 
there is a high correlation between ability and achievement or 
between character and results, and that this particular individual 
has such a low mark on the X axis that we must expect him to 
have a low mark on the Y scale. When you say, “Birds of a 
feather flock together, you are indicating that the trend, or 
regression line of character is positive, that good people seek good 
people and bad people seek bad people, and that the evil charac- 
ters with which this particular person associates indicate that he 
is of low character himself. The expression, “Penny wise and 
pound foolish, means that there is a negative regression line, or 
inverse trend between saving on small items and wasting on large 
ones. “If you go in at the big end of the horn, you will come out 
of the little end^^ expresses the same statistical concept. In fact 
the relations of trend or regression as developed in Chapter 11 
and the ideas of correlation analyzed in Chapter 12 are only 
highly exact and mathematical methods of expressing relation- 
ships which we use every day in the common affairs of life. 

The close relationship between statistical theory and methods 
and the universal behavior of mankind may be illustrated at 
great length, but a few more will suffice at this point. The expres- 
sion, “From Dan to Beersheba,^’ or “From Maine to California,'^ 
or “From here to Landsend," means a wide range of data. “The 
exception that proves the rule" is a way of saying that the devia- 
tion of the items from the mean, average or trend merely proves 
that there is a stable or dependable average or base. “He is lost 
in the woods," or “He can't see the town for the houses," means 
that one is confused by the vast amount of data and has failed to 
analyze them into their true mean, trend, and relations. 

Suffice it to say that we all, often without realizing it, select 
friends, sweethearts, wives, husbands, business partners, profes- 
sions, automobiles, clothing, real estate, vacations, and invest- 
ments, and avoid enemies, dangers, losses, and death as long as 
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we can by means of the basic principles of statistics as computed 
in averages, dispersions, trends, correlations, indexes, errors, and 
sampling. In studying the science and methods of statistics, there- 
fore, we are only acquiring a clearer understanding and more ac- 
curate measurement of principles and rules we have long been 
using and cannot get along without. 

Statistics is the science and method of analyzing groups of 
measurements. A series of measurements having been made of 
stock prices, or distances to the stars, or gas pressures, or yields 
of wheat, or rainfall, or plumbers’ wages, or heights of children, 
or interest rates, or births, or exports, or of anything, statistics 
supplies the theory and methods for classifying, organizing, and 
computing representative values of these measurements. When 
experimenters and investigators have completed their projects 
they often have a body of data so complex, varied, and voluminous 
that their meaning is not clear. The real trends and relationships 
are so obscured by a mass of details that careful analysis is re- 
quired to bring out the significant and principal points. The 
processes and rules of analyzing these numbers so that their true 
inner meaning is revealed is the science of statistics. 

One might ask the simple question. How tall are grade school 
children? The attempt to get an answer would raise a number of 
other questions, such as: Which grades, first to six, or first to 
eight? White or colored children? Native white or foreign born? 
Scandinavian or Italian children? etc. 

Having determined the particular population of children to be 
studied, we measure the height of one child. This single datum is 
not sufficient for a statistical study. We then measure another 
child and find that he is taller or shorter than the first. We can 
now make comparisons, but our sample is too small to obtain 
significant results. We, therefore, continue to measure children 
until we have recorded the heights for some thirty, forty, or even 
one hundred, or more. We are perplexed by the wide variation of 
heights. Some children are only forty inches tall; others are 
sixty or even seventy inches tall. The original question remains, 
What is the height of grade school children? The answer requires 
statistics. We cannot use all the numbers as an answer. That 
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would be completely confusing. The shortest height does not 
adequately represent the group; it is too short. The greatest 
height has no better claim; it is too tall. We must compute an 
average, or perhaps, various kinds of averages. Even this is not 
sufficient. Measures of scatter of the heights around their mean 
as well as relationship with age, sex, weight, health, race, and 
other qualifying conditions and related variables must be computed 
before we have the full answer. The simple question of the height 
of grade school children requires the science of statistics for an 
adequate answer. 

Another common question is. What is the price of wheat? This 
query suggests others, such as, What kind of wheat? Spring or 
winter wheat? White or red wheat? Hard or soft wheat? Wheat 
on the farm or in the city elevators? For a complete analysis 
many other questions must be answered to delimit accurately and 
scientifically the measurements desired. Having decided what 
kind of wheat to measure, and where and when to measure its 
price, we still are faced with the problem of differentiating be- 
tween several types of prices and price movements, such as, daily 
changes, seasonal variation, secular trends, and cyclical fluctua- 
tions, besides accidental or occasional factors. An adequate 
measurement of the price of wheat requires the science of statistics. 

Similar requirements would arise in the measurement of the 
tensile strength of steel, the cost of going to college, the cost of 
textbooks, the price of theater tickets, the rent of dress suits, stu- 
dents^ grades, the distance to the moon or Venus, the beginning 
salaries of teachers or engineers, or any other set of measure- 
ments. Any factor for which varying measurements may be ob- 
tained requires statistics for the anatysis of the data. Such factors 
are called variables, because they change from one measurement 
to the next. The quantity of rainfall is a variable, changing from 
day to day, and month to month, as well as from year to year. 
Similar variability is manifest in length of day, heat, wind ve- 
locity, road accidents, fuel consumption, food intake, health, the 
death rate, the interest rate, the tax rate, and practically all other 
human activities and natural phenomena. 
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DIVISIONS OF STATISTICS 

Statistics as a science may be divided into statistical theory, 
statistical methods, and statistical results or measures. 

Statistical theory is largely mathematical, deductive, and logical. 
It rests on the inherent relationships of numbers as expressed in 
algebra, geometry, analytics, and calculus. Statistical theory 
explains the reasons for and the relationships between statistic 
cal methods, computations, and results. It supplies the logical 
outline and scheme of relationships on which any sound and de- 
pendable analysis of numbers must rest. Statistical theory ex- 
plains why the various statistical methods must be set up in 
specific forms for particular results and the degree to which each 
statistic is dependable. It is the body of quantitative and numer- 
ical logic which guides the statistician to desirable and dependable 
results. Clear and accurate thinking is as necessary in the analysis 
of numbers as in any other field of science. The student should 
be certain always that he understands the full reason for his 
computations and why and to what extent they are significant. 
While a knowledge of calculus is desirable in the study of statistics, 
it is not absolutely necessary. Students with a fair command of 
algebra can operate the formulas and understand their results. 
This text is designed to meet the needs of students who have as 
little as one semester of algebra. With that limited background 
they can manipulate the formulas and have a working idea of 
their meaning. To become an able statistician, however, the stu- 
dent should have at least two years of mathematics. ^ 

Statistical methods are the devices for achieving the desired ends 
explained in the theory. They are the outward manifestations of 
the inward logical relations of the theory. Since a method is a 
means to an end, its relative excellence and appropriateness de- 
pend on the specific end sought. There are often several ways of 
doing approximately the same thing, or reaching identical results. 
Some methods are shorter, or easier, or give more complete anal- 
ysis, or are more logically related to particular types of data. 
In the following chapters often several similar or slightly varying 
tnethods will be given. This is done not to confuse the student. 
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but to equip him more completely for effective work. Just as a 
dentist or surgeon has many kinds of instruments for different 
operations, and just as an automobile mechanic has many tools 
for the various parts of an automobile, so a statistician must 
have many methods of manipulating different types of data, or the 
same data for various purposes. Geographic, or place data, re- 
quire a type of treatment different in some respects from that re- 
quired by time data. Percentages, ratios, and relative numbers 
generally must be manipulated by another method than that used 
on absolute numbers. Rates of speed, or time changes require 
treatment differing from that required by percentages or place 
data. 

There are various degrees to which data may be analyzed. In 
some cases a simple summation, or a simple average, is all that is 
required. In other cases it is necessary to know the scatter of the 
items around the average of the data. Sometimes one must 
measure the relationship between two or more variables. These 
statistics may be computed in terms of the original data, as pounds, 
bushels, miles, dollars, or any other concrete unit. At other times, 
however, a ratio or percentage comparison is preferred. A farmer 
may say that the average yield of his wheat is 20 bushels, and that 
his poorest acre produced only 7 bushels, while the best acre 
yielded 31 bushels, and that the total production of his farm was 
1500 bushels this year and 1200 last year. These statistics are 
stated as absolute numbers. He might, however, state his results 
as follows: (1) the poorest acre produced 35% of the average, 
(2) the best acre yielded 155% of the average and the total pro- 
duction this year was 25% larger than last year. When two or 
more variables are included in one study, a still greater number of 
comparisons are possible and methods of computation available. 

The total of statistical methods available at present is the result 
of the accumulation of half a century or more. Some of the more 
simple formulas such as summation, arithmetic mean, and ratios 
are as old as mathematics. Some of the more recent methods of 
analysis perfected in the past quarter of a century or even during 
the last decade are much superior to the older methods in both 
completeness and accuracy. The purpose of the author in pre- 
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paring this text is to choose from all available methods those which 
are most simple, fundamental, and useful for beginners in statistics. 

A statistic is a summary, or representative measure computed 
from a sample. The average height of grade school children com- 
puted from a sample of individual measurements of heights, is a 
statistic. The measure of the range of grade school children's 
heights from the shortest to the tallest is a statistic. The average 
change of the price of wheat from last year to this year computed 
from a sample of price quotations, is a statistic. Any single com- 
putation or measure based on a sample of data and confornaing to 
statistical theory and methods is a statistic. A group of these 
computations is statistics. The presentation or statement of a 
statistic or of several statistics for the information and guidance of 
individuals, corporations, schools, groups, or governments is very 
important, and in our present-day world, is an extensive business. 

Many farmers and small merchants read carefully the price 
quotations, deliveries of grain and livestock, yields, weather re- 
ports, export figures, consumption trends, tables and charts in 
their trade journals as well as in the daily newspapers. These 
persons are consuming statistics. They are attempting to guide 
themselves and to rationalize their business activities by statistics. 
All large corporations, railroads, public utilities, trading and manu- 
facturing concerns depend largely on statistics of one kind or an- 
other for management guidance. The United States Government 
and the several states and most cities are continually preparing 
and publishing large volumes of statistics. If all price statistics 
were removed from all papers, magazines, radio and telegraphic 
reports for a single day, the business world would be paralyzed. 
If all statistics now available were removed from the world for 
one year, utter economic chaos and ruin would result. We live 
in the midst of statistics. We live and direct our lives by means 
of statistics, from prices to weather reports. All human activities 
from our simplest daily purchases to the making of national laws, 
treaties with foreign nations, court decisions, and the prosecution 
of wars turn largely on statistics. 

Statistics as a science in its broader meanings is not only a 
means of guiding our judgment in making the daily decisions of 
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life, but also is a device or tool for discovering new truth. It is one 
of the most powerful engines for research. Continually by means 
of statistics new and important relationships are brought to light. 
The value of new fertilizers, of new methods of advertising, of 
new machines, of new chemical processes is measured by statis- 
tics. The meaning of hitherto unknown biological, psychological, 
physical, economic, social, and technical relationships and factors 
is revealed or clarified by statistics. Those who expect to do 
effective research work must be well trained in its theory and 
methods. 

Elementary statistical theory and methods present the basic 
elements which are universal in their application and without 
which one can go no further in the science. They are as useful and 
necessary in agriculture, biology, and genetics as they are in 
economics, sociology, and education. 
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10 THE MEANING AND USE OF STATISTICS 


BEVIEW QUESTIONS 

1. What is the difference between a statistic and statistics? 

2. Explain in detail the difference between statistical theory and sta- 
tistical methods. 

3. What is the statistical meaning of the proverb, '‘He can’t see the 
forest for the trees”? 

4- What is meant by “consuming statistics”? Who consumes sta- 
tistics and why? 

5. What is the meaning of the statement, “Statistics is the science of 
analyzing groups of numbers”? 

6. How is statistics a tool of research? 

7. Does anyone besides statisticians use statistical methods? Explain. 
S. What is the difference between the ordinary “day-by-day common 

sense approach” to the problems of life and the “statistical approach”? 





CHAPTER 2 


THE USE OF NUMBERS 


Those mature students who specialize in advanced statistics 
must necessarily bring to their task an adequate knowledge of 
mathematics including calculus and the theory of probability and 
least squares. Such a training is an advantage in the study of 
elementary statisticSj but it is not necessary. This text has been 
prepared for students who understand only arithmetic and simple 
algebra. Although most college students take courses in mathe- 
matics as freshmen, many of them need a review on some essen- 
tial points by the time they begin this course in statistics. This 
chapter is presented to cover the mathematical processes essen- 
tial to an easy understanding of elementary statistics. 

THE FOUR FUNDAMENTAL PROCESSES 

All mathematical computations involve in some form one or 
more of the basic operations of addition, subtraction, multiplica- 
tion, and division. Each one of these processes, when performed 
by itself, is quite simple, but when two or more of them are com- 
bined in the same problem, certain fixed principles must be fol- 
lowed to insure correct results. 

Addition. The order in which numbers are combined in addi- 
tion does not affect their sum. 

a+h+c- b+c+a=c+a+h— 
a + c + b = c + b-\-a-h + a + c, 

11 
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In each case all add to the same sum. This relationship holds 
whether the numbers are all positive, all negative, or are a mix- 
ture of positive and negative numbers. 


+ 

12 

+ 

8 

+ 

7 

-- 

8 

- 7 

— 

1 

+ 

8 

+ 

12 

+ 

1 

— 

4 

- 12 

— 

8 

+ 

4 

+ 

1 

+ 

4 

— 

12 

- 1 

— 

4 

+ 

1 

+ 

7 

+ 

8 

— 

7 

- 4 

_ 

12 

+ 

7 

+ 

4 

+ 

12 

— 

1 

- 8 

_ 

7 

+ 

32 

+ 

32 

+ 

32 

— 

32 

- 32 

— 

32 



8 

+ 

12 
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+ 

1 

+ 7 


4 

+ 

12 

— 

8 

+ 
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8 

- 4 


1 

— 

4 

+ 

1 

+ 

12 

— 

4 

+ 1 

+ 

7 

+ 

7 

+ 

7 

— 

8 

-h 

7 

+ 12 

— 

8 

+ 

1 

- 

4 

+ 

1 

+ 

12 

- 8 

+ 

12 

+ 

8 

+ 

8 

+ 

8 

+ 

8 

+ 8 

+ 

8 


The accurate performance of any mathematical process re- 
quires close attention. If the student^s mind wanders ever so 
little, he must go back to the beginning and go through the entire 
process again. This is true whether a machine is used or the work 
is done mentally. If one is using an adding or other calculating 
machine he must screw his mind down to the immediate task. 
If one^s mind wanders the least, he is likely to place a number in 
the wrong colunrn, as 600 for 60, or transpose numbers, as writing 
79 for 97, or omit a number from the list altogether. Since most 
number work in the modern office and in research work is done 
on machines, the student should master as a part of his training 
in statistics, if he does not already possess this skill, the use of 
calculating machines. If addition must be performed without 
machines, the numbers should be added twice, once beginning at 
the bottom and once from the top. 


9,752 ^ 

9,752 

4,876 

4,876 

159 

159 

483 

483 

7,615 

7,615 

2,507 

2,507 

25,392 

25,392 ,, 
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If there is difficulty in carrying the tens and hundreds figures 
to the next column, the column additions may be written: 

32 

26 

31 

22 

25,392 

and added for the total. 

Subtraction. The algebraic signs of plus (+) and minus (— ) 
are essential parts of numbers, and the difference between the 
same series of digits varies with the changing of the signs, as in 
subtracting, 

H- 12 +12 — 12 — 12 Minuend 

+ 7 — 7 -- 7 -j- 7 Subtrahend 

+ 5 +19 — 5 — 19 Difference 

The results of a subtraction should always be checked by an 
addition of the minuend and the difference, as 

+ 7 — 7 — 7 +7 Original Subtrahend 

+ 5 +19 — 5 — 19 Difference 

+ 12 +12 “-12 — 12 Original Minuend 

or if 42,175 - 17,650 = 24,525 then 17,650 + 24,525 = 42,175. 

Multiplication. Multiplication is a short process of addition. 
It consists of taking one number, called the multiplicand, as many 
times as there are units in another number, called the multipher, 
as 


222 


222 

4 

or 

222 

888 


222 

222 

888 


In modern statistical work multiplications are usually per- 
formed on electric calculating machines w^hich operate at high 
speeds. Wherever possible, as an economy of time and an aid to 
accuracy, the student should make use of all such mechanical de- 
vices. If, however, it is necessary to perform mental multiplica- 



14 


THE USE OF NUMBERS 


tions, the task should be performed twice as a check for accuracy. 
This is very important It is a reckless waste of time to work a 
long statistical problem and in the last moment, after hours of 
labor, find that there is a major error in an early addition or mul- 
tiplication which invalidates all the rest of the work. 

Squaring numbers is the process of multiplying a number by 
itself. It is an inexcusable waste of time to do this if the square 
can be read from a table. Most modern texts and workbooks on 
statistics contain such Tables of Squares for all whole numbers up 
to 1000. Appendix Table IV contains such a table. The working 
of statistical problems frequently requires the use of many squares. 
From the table they may be read and recorded in one-tenth of the 
time that they can be computed even on a machine. Such tables 
are of great value to the statistician. He should ^earn to use 
them early in his career. 

Division. Division is a short process of successive subtractions. 
It consists of taking from one number, called the dividend, an- 
other number, called a divisor, as many times as there are units 
in a third number called a quotient. 

The statistician finds electrically driven calculating machines 
of the greatest importance in performing divisions. If he has a 
large amount of work to do, the electric calculator is a necessity. 
Most college laboratories, research laboratories, and business 
offices are adequately supplied with such equipment. It is a 
necessary part of a student^s training in statistics to learn to 
operate such a machine efficiently. This requirement applies es- 
pecially to the processes of multiplication and division. 

If divisions in statistical problems must be performed mentally 
they should be worked twice as a check on the accuracy, or veri- 
fied by multiplication, as 

69)7,863 I f^nit Quotient 

113f§ Total Quotient 
96 or 113.96 as a decimal 

m 

273 

W 

66 Eemainder 
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verification 

113.96 

m 

102564 

68376 

7,863.24 or 7,863. correct to units 
or 

113 

69 

1017 

678 

7797 

66 Remainder 

ORDER OF PERFORMING OPERATIONS 

The order in which simple additions and subtractions are per- 
formed does not alter their sum, 

as 20 - 17 + 35 4- 7 - 60 + 24 = 9 

or - 17 - 60 + 7 + 24 + 20 + 35 = 9 

or + 35 - 17 + 7 + 35 + 24 - 60 = 9 

but when parentheses are placed around two or more of the num- 

bers, the operations within the parentheses must be performed 
first, as 

20 - 17 - (35 + 7 - 60) + 24 - 20 - 17 - 18) + 24 = 20 - 17 + 

18 -h 24 = 45 

But if the sign in front of the parentheses is changed, all the signs 
within the parentheses must be changed or the result will be al- 
tered. If the minus sign (—) in front of the parentheses is changed 
to plus as 20 - 17 - (35 + 7 - 60) 4- 24 changed to 20 - 17 4- 
(35 4- 7 — 60) + 24, the signs of (35 4- 7 — 60) must be changed 
to (— 35 — 7 4- 60). If this is done the result is still 45, for 
20 - 17 +(- 35 - 7 4- 60) + 24 = 45 but, 20 - 17 - ( + 18) + 
24 = 9. 

When multiplications and divisions are to be mixed with addi- 



16 


THE USE OF NUMBERS 


tions and subtractions in a problem, the sequence in which the 
multiplications and divisions are performed does alter the result, as 

20 + 28 2 + 40 - 60 X 4 = 16 

if all the operations are performed in successive sequence, but if 

20 + (28 - 4 - 2) H- 40 - 60 X 4 = 56 
and 20 + (28 4- 2) + 40 - (60 X 4) = - 166 

and 20 + (28 - 2) + 40 - (- 60 X 4) = 314 

It is, therefore, preferable in such mixed problems always to 
enclose the numbers that are to be multiplied or divided in paren- 
theses to avoid errors and to make the required operations per- 
fectly clear. 

COMMON FRACTIONS 


It is often necessary to express statistical quantities and their 
relationships in terms of fractions. Not all variables can be stated 
as whole units. Quotations of changes on the stock market are 
expressed as J, ■§•, etc. Interest rates are stated in the same 
form. The student in statistics will frequently have to manipu- 
late common fractions. 

Addition. In order to add J, J, and -I some common basis of 
comparison is essential. This basis is the lowest common denom- 
inator. It is the smallest number that will contain evenly the 
separate denominators of all the fractions to be added. The sim- 
plest method of obtaining a common denominator is to multiply 
together all the denominators of the separate fractions, as, in the 

caeabove, 3X4x5-60 


60 is a common denominator of the fractions J, J, and f. It is, 
in this case, the least common denominator. No smaller number 
can be divided evenly by 3, 4, and 5. Having found the common 
denominator, the denominator of each fraction is divided into the 
common denominator and the numerator of the fraction multi- 
plied by this quotient, as 
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If there are many fractions to be added and if some of their 
denominators are identical or are multiples of other denomina- 
tors, the method of obtaining the least common denominator is 


Fractions to be added: f, f, f, f, f, -5^ 


Divisors 

Denominators 

3 

3 

5 

6 

8 

4 

18 

2 

1 

5 

2 

8 

4 

6 

2 

1 

5 

1 

4 

2 

3 


1 

5 

1 

2 

1 

3 


The denominators of the given fractions are divided by any num- 
ber which will reduce the size of two or more of them. When 
such division is no longer possible, all the divisors and all the re- 
mainders are multiplied together for the least common denomina- 
tor of all the given fractions, as 

3X2X2X1X5X1X2X1X3 = 360 

which is the smallest number which can be divided evenly by 3, 
5, 6, 8, 4, and 18. 

Subtraction. The process of subtracting common fractions 
is identical with that of their addition except that the differences 
of their numerators are taken, as 

4“"7=A — ^ = 

or 4 + 4 — - 1 = 60 + 60 " 60 ='^ 

Multiplication. Two or more common fractions are multiplied 
by multiplying their several numerators for the product numerator 
and by multiplying their several denominators for the product de- 
nominator, as 1 0 2 'W' ^ V/ 4 240 1:. 

’ — 480 — 2 

or, by canceling out all even values in the numerators and de- 
nominators the process may be shortened, as 
1 

2 1 

10X2X3X4 _1 

4x3xBx0 2 

1 

2 




18 


THE USE OF NUMBERS 


Division. The division of common fractions is more difficult 
than their multiplication. Since division is the opposite of multi- 
plication^ the division of f by J may be accomplished by mul- 
tiplying the numerator of the dividend by the denominator of 
the divisor and multiplying the denominator of the dividend by 
the numerator of the divisor, as 




2 X 4 -M X 5 
or i^-^f = 3X10-5-2X21 = if-f 


This relationship of cross multiplication is the basis for the funda- 
mental rule for the division of fractions: Invert the divisor and 
multiply, as ^ . i _ 4 v- a s 

5~2'~5'^t=5 
nr 10 • 2 _ 10 w 3 — 30_5 

or tT"=-3“2tAf-f2--7 


MIXED NUMBERS 

A mixed number is a whole number and a fraction combined, as 

2f, 3f, 12i 

Such numbers may be added, subtracted, multiplied and divided 
by various methods, but if there are many of them in a problem 
it is better, in case of multiplication and division to reduce them 
to improper fractions and proceed as with fractions. An improper 
fraction is one with a larger numerator than denominator, as 

4i = 2f = 3f = 12i == ^ 

Mixed numbers are changed to improper fractions by multi- 
plying the whole number by the denominator of the fraction and 
adding to this product the numerator of the old fraction for the 
numerator of the improper fraction, and using the denominator of 
the original fraction for the denominator of the improper fraction, 
as 44 = 4 

12 + 1 = ^ 

2f= 2 

16 + 5 = ^ 
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To divide 4 - 3 - by 2f one would reduce to improper fractions, 
3^- -i- ^ and inverting the divisor X ^ cancel and multiply 


2 

4^21 


DECIMAL FRACTIONS 

In statistical computations decimal fractions are used much 
more frequently than common fractions because they are based 
on the familiar decimal system, and because they may be readily 
computed on all calculating machines. Common fractions are 
converted into decimal fractions by dividing the numerator by 
the denominator, as 


i = 1.0 -f- 4. = .25 
I = 1.0 -5- 3. = .3333+ 
f = 5.0 -J- 8. = .625 
if = 13.0 21. = .6428+ 

Aliquot Parts. Certain common fractions can be easily and 
quickly converted to decimal fractions because the denominator 
is an exact divisor of the numerator. Such fractions are i, f, 
F, tV. mi mi considered as the denominator the 

aliquot parts are: 


umber 


Part of 100 

2 

= 

■50 

4 

= 

-h 

5 


20 

8 

= 

1 0 

125 

10 

= 

fo 

m 

= 

JL 

9 

12i 

= 

1. 

8 

14f 

= 


16| 

= 

i 

20 

== 

i 

25 


i 

33i 

= 

i 

50 

= 

i 
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Addition of Decimals. In the addition of decimals the deci- 
mal points of the fractions are placed in a vertical line, as 

4.72 
.548 
1.9627 
.5 
14.91 
275.073 
297.7137 

Subtraction of Decimals. In 

fractions the decimal points are kept 
45.732 

9.1468 or 
36.5852 


the subtraction of decimal 
in a vertical line, as 

16.0798 

9.5160 

6.5638 


This rule for both addition and subtraction applies on all calcu- 
lating machines as well as in mental or hand operations. 

Multiplication of Decimals. Numbers composed entirely 
of decimals or including decimal fractions are multiplied just as 
other numbers, and then the decimal point is located in the 
product by counting off from the right end of the product as 
many digits as there are decimal places in the multiplicand and 
multiplier combined, as 


37.654 29.02 .175 

2.049 .05794 .5 


338886 

150616 

75308 

77.153046 


11608 .0875 

26118 
20314 
14510 
1.6814188 


If there are fewer digits in the product than there are required 
decimal places, ciphers are added to the left of the last digit until 
the required number of digits are present in the product. 

Division of Decimals. Numbers composed of decimals or 
containing decimal fractions are divided just as other numbers, 
and then the decimal point is located in the quotient by counting 
off from the right as many digits as the number of decimal places 
in the dividend exceed those in the divisor, as 
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24)487.2 20.3 

.15')598.65 1 3991. 

48 

45 

72 

148 

72 

135 


136 


135 


15 

25.V748200 | .029924 

25.)6.25 1 .25 

50 

50 

248 

125 

225 

125 


232 

225 


70 

50 

200 

200 


Ratios 

When a quotient is compared with a unit it establishes a ratio 
between the dividend and the divisor, as 

- 2, or 10 : 5 :: 2 : 1. 

The ratio of 10 to 5 is 2 to 1. 

A OK 

^ = .25, or 6.25 : 25 :: .25 : 1. 

The ratio of 6.25 to 25 is J or .25 to 1. 

5^ = 25, or 500 : 20 :: 25 : 1. 

The ratio of 500 to 20 is 25 to 1. 

^ = .04, or 20 : 500 :: * (or .04) : 1. 

The ratio of 20 to 500 is ^ or .04 to 1. 

By comparing any quotient to 1 a ratio is established between the 
dividend and the divisor or between the numerator and the de- 
nominator of the fraction which they compose. 
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Percentages 

A percentage (by the hundred) is a ratio multiplied by 100, as 

^ - 2, or 10 : 5 :: 2 : 1, or 10 is 200% (2 X 100) of 5. 

Xq = .5, or 5 : 10 :: .5 : 1, or 5 is 50% (.5 X 100) of 10. 

Any ratio expressed as a decimal may be changed to a percentage 
by moving the decimal point two places to the right, as 

^ = 2.00 as a ratio and 200 as a percentage. 

Xo = -5 as a ratio and 50 as a percentage. 

^ = 4 X 100 = 400% 

^ = .25 = 25% 

^ = 2.5 = 250% 

The computation of percentages involves three factors — base, 
rate and percentage. The symbols are 

Base = h 

Rate = r 

Percentage = p 

The usual formulas are, 

hr — p 1600. 

m 

$30.00 

? = r $600.)30.00 I .05 

^ 3^ 

^ = 6 ^)30.00 I $600. 


Division by Reciprocals 

Since the multiplication and division are opposites, it is possible 
to divide numbers by multiplying the dividend by the reciprocal 
of the divisor, as 

= 2 Reciprocal of 5 == I = .2 
10 X .2 = 2. 
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The reciprocal of any number is 1 divided by that number. 


Number Reciprocals 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 


1 1.000000000 
i .500000000 
i .333333333 
i .250000000 
i .200000000 
i .166666667 
I .142857143 
i .125000000 
^ .111111111 
io .100000000 


Appendix Table IV in this text contains the reciprocals of all 
numbers up to and including 1000. 

Division by means of multiplication by a reciprocal is a great- 
economy when several numbers are to be divided by the same 
divisor. For instance, if fifty numbers are to be divided by 8, 
the results could be obtained much more quickly by multiplying 
each one of them by .125, the reciprocal of 8, because multiplica- 
tion is a more rapid operation than division on a calculating 
machine. The student should use this economical method when- 
ever it is possible. 

The degree of error resulting from the use of reciprocals in 
computation depends on the accuracy of other numbers to which 
the reciprocals are applied. Hence, the number of significant 
digits in the reciprocals must be conditioned by the number of 
significant digits in the series upon which they are used. 


ROUNDING NUMBERS 

In computing quotients, ratios, and percentages, as well as in 
multiplications, additions, and subtractions, one frequently car- 
ries a result to a larger number of decimal places than the accuracy 
of the original data justifies. In such cases it is necessary to cut. 
off the excess of decimal places. This is called rounding num- 
bers. 

In adding or subtracting decimals, decimals of more than one 
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place beyond the shortest decimal in any one number are not 
significant, as 


The total should be rounded to 
22.1 correct to one decimal place. 


This should be rounded to 3.78 
correct to two decimal places. 

Those figures in a sum, difference, product, or quotient which 
express the degree of accuracy in the original data are said to be 
significant In the illustration of the addition above, the 46 is 
not significant because two of the numbers are not made definite 
beyond the first decimal period. If the figures had been 

6.436 

7.500 

2.810 

5.400 

22.146 

the third decimal place would have been significant because all 
the original numbers were measured to the third decimal place as 
to accuracy. 

In multiplication the product is not significant beyond the least 
number of significant figures in either multiplicand or multiplier. 

In rounding numbers it is customary to follow these rules: 

1. If the amount dropped is more than .5, raise the last remain- 
ing digit one unit, as 47,652 would be 48,000 if three places were 
dropped, and 47,700 if two places were dropped. 

2. If the amount dropped is less than .5 do not raise the next 
digit, as 47,449 would be 47,000 if three places were dropped and 
47,400 if two places were dropped. 

3. If even .5 is dropped the next digit is raised one unit if it is 
odd and remains unchanged if it is even, as 47,500 would be 48,000 
if three places were dropped, while 46,500 would be only 46,000. 


6.436 

7.5 

2.81 

5.4 

22.146 

6.2629 

2.52 

3.7829 
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In division there should be no more significant digits in a quo- 
tient than there are in either the dividend or the divisor, which- 
ever is least. If the estimated population of a city is 55,000 and 
the estimated value of its real estate $66,660,000, the per capita 
value of real estate is $66,660,000 55,000 = $1,212. Since the 

divisor has only two significant figures, although the dividend has 
four, the quotient cannot be considered accurate beyond the two 
significant figures, $1,200. 


SQUARE ROOTS 

Most texts and workbooks in statistics contain tables of squares 
and square roots of all numbers up to 1,000. Appendix Table IV in 
this text is such a table. If a student requires the square root of 
any whole number up to one thousand, he should read it from the 
table in order to economize time. By this method he can locate 
and record the square roots of a dozen or more numbers in the time 
that would be required to extract one root. If, however, a root 
must be extracted, the following model problems will clearly in- 
dicate the necessary steps. 

1. Beginning with the decimal point, divide the number into 
periods of two digits each, as 

6 78. 48 M 

2. Take the largest possible square out of the first period on the 
left, as 

6 48 ^1 2 

4 

In this case it is 4, the square of 2. 

3. Subtract the square from the period and bring down the 
next period, as 
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4. Double the root already found for a trial divisor and divide 
it into the digits brought down excluding the last one, as 

6 48 25] 2 

4 

4 )2 78 

4 will go into 27 six times. 

5. Write the 6 in the quotient and also after the 4 in the di- 
visor, as 

6 48 25| 26 

4 

46 )2 78 
2 76 
2 

and multiply the completed divisor 46 by 6 = 276 and subtract 
from the 278, which leaves the remainder 2. 

6. Bring down the next period, 48, and proceed as before, as 

6 re. 48 ^1 260 
4 

46 )2 78 
2 76 
52 )248 

Double the root already found, 26. Divide it into the figures 
brought down excepting the last. 52 into 24 will not go. Record 
a zero (0) in the quotient as 260. 

7. Bring down the next period and proceed as before, as 

6 re. 48 ^1 260 
4 

46 )2 78 
2 76 

520 )2 48 25 

Double the root already found, 260, and divide the 520 into the 
figures brought down, 24825, excepting the last one. 520 goes 
into 2482 four times. Write the 4 in the quotient as 2604 and also 
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in the divisor as 5204 and multiply the completed divisor, 5204, by 
4 which equals 20816 and subtract, 

6 78. 48 ^1 26.04 (or 26.05) 

4 

46 )2 78 
2 76 

5204 )2 48 25 
2 08 16 
40 09 

Since the remainder, 4009 is more than one-half of the divisor, 
5204, the quoti^t may be written 2605 instead of 2604, or an- 
other period of 00 may be brought down and the root carried to 
another decimal place. 

8. Beginning at the right point off as many decimal places in 
the root as there are periods to the right of the decimal point in 
the original number. This is 2. The root correct to two decimal 
places is, therefore, 26.05. 

A second example is: 

85. ^ 70 00| 9.257+ 

182 )4 69 
3 64 

1845 )1 05 70 
92 25 

18507 ) 13 45 00 
12 95 49 
49 51 


SIMPLE ALGEBRAIC EQUATIONS 

Elementary statistics occasionally require the solution of sim- 
ple simultaneous equations. The equations, 

(1) 2Z + 47 = 20 

(2) 3Z - 27 = 12 

may be solved by substituting 7 values for X values as follows: 
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First Method: 


2X = 20 - 47 
20 - 47 


Simplifying this equation gives 
20 47 


3X = 
X = 

12 + 27 


12 + 27 
12 + 27 


60 - 127 = 24 + 47 
127 - 47 = - 60 + 24 
- 167 = - 36 


16 


Substituting in equation (1) 


2X = 20 - 4(2J) 

2X = 20 - 9 
2X = 11 
X = 5.5 

Second Method: The same equations may be solved by multi- 
plication and subtraction, as follows: 


( 1 ) 

( 2 ) 


2X + 47 = 20 
3X - 27 = 12 


Multiply each equation by numbers that will make the co- 
efficients of X equal, as 

3(2X + 47 = 20) = 6X + 127 = 60 
2(3X - 27 = 12) = 6X - 47 = 24 
Subtracting 


7 = 


167 = 36 
16 

Substituting the value of 7(2J) in equation (1) to obtain the 
value of X, we have, 

2X + 4(2i) = 20 
2X + 9 = 20 
2X = 20 - 9 
2X = 11 
X = 5.5 

Proof: 2(5.5) + 4(2^) = 20 
3(5.5) - 2(24) = 12 

11 + 9 - 20 
16.5 - 4.5 = 12 
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In the above example the figures are small and easy of solu- 
tion, but in the chapters on Regression and Correlation the stu- 
dent will find that this method gives numbers so large that they 
are cumbersome to manipulate on a calculating machine. To 
overcome this difficulty a shorter method has been developed. 

Third Method: The equations are solved by division and sub- 
traction, as 

(1) 2Z + 4F = 20 

(2) 3X - 27 = 12 

Divide each equation by the coefficient of X in that equation, as 

(2Z + 47 = 20) = (IX + 27 = 10) 

(3X - 27 = 12) = (IX - .66677 = 4) 

Subtract (l')X + 2.00007 = 10 
(20X - 0.66677 = 4 
2.66677 = 6 

7 = - i ^ oi- 

^ 2.6667 

Substituting Y in equation (1') we have 

X + 2.0000 X 2i = 10 
X = 10 - 44 
X = 5.5 

By (1) division and (2) subtraction the size of the numbers are de- 
creased and the arithmetic solution of the problem made much 
easier. The student will find it to his great advantage to use 
this method exclusively. 

POWERS AND EXPONENTS 

An exponent indicates how many times a number has beeiv 
multiplied by itself to produce a product, as 

10 X 10 = 100 = 102, or 
10 X 10 X 10 = 1000 = 103, or 
6X6X6 = 216 = 63, or 
12 X 12 X 12 X 12 = 20,736 = 12^, or 
aXaXaXaXa = a^, OT 
f XxXx = fx^ 
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An exponent indicates the degree to which a number is to be 
dissolved or broken down into equal factors, as ''^16 = 4. This 
means then when is dissolved into two equal factors each one 
of them is 4. '^27 means that 27 broken down into 3 equal 


fa^rs equals 3, or one of the three equal factors of 27 is 3. 
*^64 = 2. 2 is one of the 6 equal factors of 64. V is called the 
radical sign and indicates that a root is to be extracted. One of 


the most used powers in elementary statistics is the binomial 
expansion, {a + 5)^. 

N 

Power 


2 (a + by = + 2aZ> + 62 

3 (a + by - -h 2>a?h + Ub^ + ¥ 

4 (a 4- by = + 4a®6 + 6a262 + 4a6® + ¥ 

5 (a 4- by = + 10a362 4- 10a263 4- 606^ 4- 6^ 

e (a 4- by = a® 4- 6a56 -h 4- 2Qa^¥ 4- 15a26^ 4- 6a¥ 4- 6^ 

7 (a + by ^ oJ + 7a% 4- 2la%^ 4- 4- ^ba^b^ 4" 21a26s + 7a¥ 

4" F 

8 (a 4- by = a^-y 8a7h + 2Sa%^ 4- bda^b^ 4- 70a^¥ -f 

4“ 28a^b^ 4” 8ab^ 4“ 6® 

9 (a 4- by = a9 4- 9a^b 4- 36^^62 4- S4a^b^ + 126a56^ 4- 126^^65 

4- S4a^b^ 4- 36a267 4- 9ab^ 4- 6^ 

10 (a 4“ by^ = aio 4- lOa^h 4~ 45^862 4- 120a763 4- 2l0a%^ 4- 252a%^ 

+ 210a46^ 4- 120a»67 4- 45a26« 4- 10a6® 4- 


This formula (a 4- 6)^ is used to explain the development of 
such equations, as 


XXY - NXY 
SZ2 - N(xy 


and 2a;2 = SX2-^^^ 
N 


It is also useful in explaining the shape of frequency distributions. 


LOGARITHMS 

Logarithms are in a sense the shorthand, at least the short cut, 
of mathematics and as such are of great value in statistical com- 
putations. In multiplying and dividing large numbers and es- 
pecially in extracting high roots, they not only make otherwise 
most difficult operations quite easy, but enable us to use superior 
methods that would be impossible without them. 
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The common system of logarithms is based on 10. The log of 
a number is the exponent to which 10 must be raised to be equal 
to that number. 

10 X 10 = 100 = 102 
10 X 10 X 10 = 1,000 = 10^ 

10 X 10 X 10 X 10 = 10,000 = 104 

A log is an exponent. The log of 100 is 2, the log of 1,000 is 3, 
the log of 10,000 is 4, and the log of 1,000,000 is 6. The exponent 
or log of 10, 100, 1,000, 1,000,000 and all numbers consisting of 
1 plus ciphers only are whole numbers, as 1, 2, 3, 6, etc. The 
logarithms of all other numbers, such as 12, 75, 120, 400, 7,000, 
etc. are mixed numbers, or contain a whole number plus a fraction. 
If the logarithm of 100 is 2 and of 1,000 is 3, the logarithms of all 
the numbers between 100 and 1,000 must be more than 2 and less 
than 3, or must be mixed numbers containing fractions. 


Number 

Logarithm 

100 

2.00000 

105 

2.02119 

150 

2.17609 

500 

2.69897 

800 

2.90309 

1000 

3.00000 


Every logarithm is composed of two parts, the whole number 
which is called the characteristic and the fraction which is called 
the mantissa. In the log of 105 the characteristic is 2. and the 
mantissa is .02119. In the log of 800 the characteristic is 2. and 
the mantissa is .90309. In the case of 100, the characteristic is 
2 and the mantissa is .00000. The mantissa, or fractional part of 
the logarithm, is all that is ever found in Log-Tables, or Tables 
of Logarithms. Although all decimal points are omitted in such 
tables it is understood that there is a decimal point before each, 
mantissa in the table. They must be so used. 

The characteristic is determined by the number of digits to the 
left of the decimal point in the original number or anti-log or anti- 
logarithm, as it is called. 100 is the anti-logarithm of the loga- 
rithm, 2. 150 is the anti-logarithm of the logarithm, 2.17609. 
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These relationships are illustrated by the following examples: 

Number is Logarithm 

log of 732,600. 5.86487 

log of 73,260. 4.86487 

log of 7,326. 3.86487 

log of 732.6 2.86487 

log of 73.26 1.86487 

log of 7.326 0.86487 

log of .7326 9.86487 - 10 

log of .07326 8.86487 - 10 

log of .007326 7.86487 - 10 

The characteristic of the logarithm of the number 1 or of any 
number larger than 1, is one unit less than the number of digits to 
the left of the decimal point in that number, or anti-logarithm. 
The characteristic for 732,600 is 6. because there are six digits in 
this number to the left of the decimal point. The characteristic 
for 73.26 is 1 because there are two digits, 7 and 3, to the left of 
the decimal point in this number. 

The characteristic of the logarithm of any number less than 1, 
or a fraction, is one more than the number of zeros between the 
decimal point and the first significant figure and has a minus 
sign. The characteristic for .7326, a fraction, is — 1, and is written 
9 — 10 to facilitate addition and subtraction with other loga- 
rithms. For .07326 the characteristic is — 2, or 8 — 10, and for 
.007326 it is - 3, or 7 - 10. 

To multiply numbers add their logarithms. 

100 = 102 25. = 10 ^ 39794 

1000 = 103 3. = 100-47712 

100,000 = 100 ^ 75 . == 10^*37506 

The logarithm of 7,896. is 3.897407, and the logarithm of 95 

is 1.977724; therefore, their product is the anti-logarithm of 

3.897407 

1.977724 

5.875131, or 750,120. 

To divide one number by another subtract the logarithm of the 
divisor from the logarithm of the dividend. 
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75 ^ 3 = 25 
Log of 75 = 1,87506 
Log of 3 = .47712 
1.39794 

The anti-logarithm of 1.39794 is 25. 

1,000,000 has a log of 6 
100 has a log of 2 
4 

1,000,000 ^ 100 = 10,000 

The logarithm of 10,000 is 4. 

To raise any number to a high power multiply its logarithm by 
the desired power. 

252 = 625 

The logarithm of 25 is 1.39794. 1.39794 X 2 - 2.79588. The 

anti-logarithm of 2.79588 is 625. To solve the problem, what is 
the cube of 77? One locates the logarithm of 77 in the table. It 
is 1.886491. Multiply this logarithm by three. 

3 X 1.886491 == 5.659473. 

Locate the anti-logarithm in the table. The number is 456,533. 

To extract any root of a number, divide the logarithm of the 
number by the index of the root. What is the square root of one 
million? The logarithm of 1,000,000 is 6. 6 -r- 2 = 3. The. anti- 
logarithm of 3 is 1000, the square root of 1,000,000. What is the 
4th root of 1,234? Its logarithm is 3.091315. 

3.091315 - 4 = .772829. 

The anti-logarithm of .772829 is 5.9269. 

Such operations can be performed by the use of a table of 
logarithms in a small fraction of the time that would otherwise be 
required. In fact, the extraction of roots beyond the third degree 
is impossible for the person with no mathematics beyond elemen- 
tary algebra without logarithms. Appendix Table V in this text 
is a Table of Logarithms. The student should learn to use it 
effectively early in the course. 
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PKELIMINAKY EXERCISE IN MATHEMATICAL PROCESSES 

Name op Student .... Date 

PROBLEMS 


1 . Solve: f 1 = 


2. Subtract f and |- 
Multiply f and f 


3. Solve; 0.95 X 0.0011 and .56 .08 


4. Solve: (.007)2 


5. Express 2 and -^5 in decimals, recording result ac- 
curate to two decimal places. 


6 . Reduce 2.3127 to 3 decimal places 


7. If A is to R as 8 is to 5, what percent is A of R? 
1 What percent is J5 of 


8 . If C is 20% greater than D, what is the ratio of D 
to C? 


9. What percent is 7 in a total of 35? 


10. Extract square root of 130, expressing result accurate 
to one decimal place; and extract square root of 0 . 5 , 
accurate to two decimal places. 


11 . Give the characteristics of the logarithms of the fol- 
lowing numbers : 

100 

30 

0.3 

1.003 


12. The mantissa of 75 is .8751 

1 The mantissa of 0.25 is .3979 

i 75 

} (a) Give the logarithm of tt— 

(b) Give the logarithm of 75 X 0.25. 


Maximum ifcime allowed for the exercise 30 minutes. 

Time actually taken 
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EXERCISES 

1. Add: i — ^ — i; i-J-H-J + io; l + l + f — f; f + f — f 
+ i — f. 

2. Simplify: 2 + 4 - (7 - 2) + 14 ~ (6 + 2); 27-(9-8) + 

(7 X 4) - (3 + 2) + (^); 15 + (6 X 2) - (^) -|- 7 ~ (11 - 4). 

3. Add: 16.72, 31,4, 0.7892, 1.698, 375.42, 0.07, 0.00965, 78.95. 

4. Extract the square root of: 66.6; 19.754; 629.92; 1,972.8; 15.762; 
999 9' 827.26. 

5. Divide: 6.25/25.; .144/120.; 92.75/.0015; 1960./.14; .729/.027; 
1728./120.; .00484/11. 

6. Find logarithms of: 


375. 

0.42 

1,768,972. 

29. 

0.075 

21.781 

1011. 

12.7 

877. 

6.75 

62.5 

.00321 

7. Find antilogarithms of: 

2.86147 

0.0019033 

1.636994 

3.94328 

0.024674 

0.753896 

4.22194 

0.29292 

2.775543 

1.03243 

7.688955 

0.002612 


8. Extract the appropriate root of the following numbers by logarithms: 

^1728 *>^0^ ‘^92.765 -^276. ^^lO^OS 

9. Multiply: 7.8 X 16.92; .072 X .065; 25.5 X .04; 18.75 X 1.5; 
.00042 X .005; 16. X .04; .04 X 16. 

10. Divide: .04 16.0; 16 4- .04; 92. .023; 720 .0024; .0096 

24.0; 2.25 150. 
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11. Compute percentages of total for each nation; 

Imports op Brazil 1937 


Country 

Value (in $1000) 

United States 

$111,186 

United Kingdom 

82,241 

Argentina 

46,122 

Germany 

41,205 

France 

24,514 




CHAPTER 3 


RATIOS AND PERCENTAGES 


In its simplest form a ratio is a quotient, or the numerical 
quantity obtained by dividing one number by another. The quo- 
tient obtained by dividing 75 by 25 is 3. The divisor, 25, is the 
base with which 75 is compared. In its complete form the state- 
ment of the ratio is 76 is to 25 as 3 is to 1, or 75 : 25 :: 3 : 1. By 
this device the larger numbers are reduced to the simple com- 
parison of 3 to 1. The ratio of the area of Kansas to the area of 
Ohio is 81,774 -j- 40,740, or 2.007 to 1, or 2 : 1. The ratio of the 
area of Texas to the area of Rhode Island is 262,398 4- 1,067, or 
246 to 1. 

Just as the total of a series of numbers is the simplest represent- 
ative quantity for the single group, so the ratio is the simplest 
form of comparative figure between two or more numbers. We 
use such comparisons frequently in the daily affairs of life. ‘^She 
is one girl in a million, '^He can whip twice his weight,’' ^^I 
would rather have it ten to one," ^^Six of one and half a dozen of 
the other," ^^Tweedle-de-dum and tweedle-de-dee," “It isn't half 
as good," “Two to one," and many other common expressions 
illustrate the universal use of ratios. 

Ratios are frequently stated as percentages, as 75 is 300% of 
25, or 75 : 25 :: 300 : 100. The area of Ohio is 50% of the area of 
Kansas, or 40,740 : 81,774 50 : 100. Such expressions as. Pro- 

duction is down 20%, Prices are up 10%, The cost of living is up 
30%, Exports have fallen off 50%, reveal how important ratios 
expressed as percentages are in our modes of thinking and living. 
We are so accustomed to thinking in terms of percentages that 
the ratio 200 ; 100 is as clear to most persons as the ratio 2:1. 
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Other bases such as 10, 1,000, 10,000, or even 1,000,000 are used 
for special types of ratios. The rule to be followed in selecting 
the base is. The base chosen should he sufficiently large to permit the 
numerator to he stated as a whole number hut should be sufficiently 
small to prevent more than three digits appearing in the numerator 
to the left of the decimal point. The death rate of Negroes in 
New York in 1938 is stated as 14.1 per 1,000. If the base were 
reduced from 1,000 to 10 the numerator would appear as .141 
which is less easy to understand than 14.1. On the other hand, if 
the base is changed to 100,000 the numerator would be 1410. which 
violates the basic principle of the ratio which is to reduce larger 
numbers to smaller ones for the sake of simplicity. 

The number of units in the base should be smaller than the 
number in the original denominator. 


BASES OF RATIOS 

A great variety of bases may be used for computing ratios. 
Ten of the most important bases are (1) Total to total, (2) Total 
to part, (3) Part to part, (4) Average to part, (5) Former time to 
present, (6) Standard area or distance to given area or distance, 
(7) Standard unit (school, family, class, etc.), (8) Arbitrary units, 
1, 10, 100, 1,000, etc. (9) Cause to effect, (10) Independent variable 
to dependent variable. Most of the ratios in general use rest 
upon one of these bases. 

1. Total to Total. This type of ratio is used to compare one 
entire group with other entire groups. Per capita income is such 
a ratio. The total national income is divided by the total popu- 
lation and the quotient written as a certain number of dollars, as 


National Income 
Total Population 


$76,035,000,000 

131,409,881 


$578.61 per capita. 


according to the Statistical Abstract, 1941. All per capita ratios 
fall in this class, such as per capita production of wheat, cheese, 
shoes, steel, etc. and per capita consumption of gasoline, oranges, 
tobacco, etc. For each ratio the total production or consumption 
figure is divided by the total population of the same geographic 
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area. It is possible to have per capita ratios for city, county, 
state, nation, world or any other unit area. For 1938 the ratio is 


National Wealth _ $309,430,000,000 
Total Population 130,231,480 


= $2,376 per capita 


as given in The Economic Almanac for 1942-1943, by the National 
Industrial Conference Board. 

2. Total to Part. This base of ratios is used to compare an 
item or sub-total of a group to the total of the group. It is used in 
business to compare department or individual results with those 
of the entire plant or industry. The part is divided by the total. 
The ratio is usually expressed as a percentage. This type of ratio 
is illustrated by Table 1. 


TABLE 1 

Feeight Car Loadings for 1941 by Class of Freight 
AND Total Reduced to Percentages op Total 



1 Car loadings 

Percentages 

Total 

42,284,927 

100.0 

Grain & Products 

2,022,429 

4.8 

Live Stock 

650,490 

1.5 

Coal 

7,590,002 

17.9 

Coke 

677,634 

1.6 

Forest Products 

2,184,987 

5.2 

Ore 

2,682,242 

6,4 

Mdse. L.C.L. 

8,041,367 

19.0 

Miscellaneous 

18,435,786 

43.6 


Source: Association of American Railroads Car Service Division, 1941 

3. Part to Part. Sometimes it is desirable to state the rela- 
tionships among the several parts of a whole in terms of one or 
more of the parts. Overhead costs may be measured as a ratio to 
direct costs. Radio advertising cost may be expressed as a per- 
centage of magazine or newspaper advertising expense. A family^s 
expenditures for clothing may be stated in a ratio to food outlay. 
Sales in the several sections of a department store may be com- 
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puted as a percentage of shoe sales, or piece goods sales, or fur- 
niture sales. The value of the by-products of a factory may be 
measured as ratios of the value of the principal product. 

4. Average to Part. In a department store the average sales 
for all the departments may be the best base for ranking the sev- 
eral divisions. The average sales of all the salesmen of a store 
may be the best standard for measuring the efficiency of each 
member of the staff. The average teaching load of the members 
of a faculty is perhaps the best standard for measuring the specific 
load of any one member of the group. 

5. Former Time to Present Time. In computing temporal, 
or time series, index numbers a former specific time period is 
always used as a base. If No. 2 hard winter wheat sold at .60 a 
bushel in 1913, the base period, and in 1942 sold at $1.08 a bushel, 
the ratio is 1.08 -5- .60, or 1.8 which is stated as a percentage 
1.8 X 100, or 180%. If the wages of press-drill operators were 
.90 an hour in 1926, the base period, and were $1.44 an hour in 
1941, the ratio is 1.44 -r .90 = 1.6 which expressed as a percentage 
is 160%. If the production of bituminous coal was 439,088,000 
tons in 1936, the base period, and is 453,245,000 tons in 1940, 
the ratio is 


1940 production ^ 453,245,000 
1936 production 439,088,000 


1.032 or 103.2% 


6. Standard Area or Distance. It is frequently advantageous 
to use a standard geographic area or political or economic area as 
the base of a ratio, percentage, or index. Population per square 
mile, average production per acre, improved farm land per farm, 
or per county, or per state, are samples of such common ratios. 
Often a standard distance is used as the base, as cost of highways 
per mile, telephone poles per mile, or airplane accidents or deaths 
per 1,000,000 miles flown, gallons of gasoline consumed per hun- 
dred miles traveled, etc. In geographic, economic, business, popu- 
lation and social analysis, such indexes or ratios are necessary, 

7. Standard Conventional Units. Society has developed or 
accepted many institutional or custom controlled units as bases 
for measuring relationships. Among these units are the indi- 
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vidual, the family^ the farm, the school, the class, precinct, ward, 
county, state, and nation. Income per family is such a ratio. It 
is obtained by dividing the total income for a given area by the 
number of families in that area. The ratio is expressed in dollars 
per family. The housing ratio of rooms per family is of the same 
type. Cattle per farms, hbrary books per school, failing grades 
per class, policemen per ward, miles of road per county are all 
ratios based on conventional units as bases. One principal ob- 
jection to such units is that they are not uniform among them- 
selves for any group. The size of the family is a varying unit 
from two persons to several times that number. Farms vary 
from three acres to many thousands of acres. Counties vary 
greatly in population and area. In spite of this lack of uniformity 
in any type of conventional unit such ratios persist and are 
widely used. 

8. Arbitrary Units. For any type of scientific study to give 
dependable results, the units of measurement must be definite 
and employed accurately. In many fields it is possible to set up 
exact units arbitrarily which result in more uniform measures 
than is possible with any conventional unit. This is especially 
true in the physical sciences. The volt, ampere, watt, kilowatt, 
B.T.U., foot pound, and horse power, are such units. The ton- 
mile, passenger-mile, and train-mile are arbitrary units used in 
railway transportation. Flight-hours, light-years, pupil-hours, 
class-hours, standard tests,^^ degrees of temperature, latitude and 
longitude, and even clock hours are a few other examples of the 
many arbitrary units in general use. 

One of the most common units for ratios is one, 1. All per 
capita ratios are based on this unit. If the numerator of such a 
ratio is a fraction, it is preferable to increase the size of the de- 
nominator until the numerator becomes a whole number. If the 

ratio is ~ it is better to state it as ^ or If the ratio is 

it is preferable to state it as or Yq^^qq ‘ were 

27 S 

.000273, it should be written as YocTUoo’ 100,000. 
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The denominator should be kept as small as possible but should 
be sufficiently large to make the numerator primarily a whole 
number. It is entirely meaningless to say that the death rate is 
.0113 per person. But to state the ratio as a death rate of 11.3 
per 1,000 is easily understood. Arbitrary bases of 1, 10, 100, 
1,000, 10,000 or even 10,000,000 are widely used and readily un- 
derstood. The most general arbitrary base is 100, or a percent. 

9. Cause and Effect. In any comparison in which there is a 
cause and effect relation between the quantities, the one judged 
to be the cause should be used as the base or denominator in the 
ratio. Factory output per man hours worked, new sales to dollar 
or lines of advertising, light or power produced per kilowatt con- 
sumed, plant growth per unit of fertilizer, taxes collected for 
thousand dollars of property, typing speed gained per hour of 
practice, and deaths from syphilis per 1,000 infections are illus- 
trations of this type of ratio. 

10. Independent to Dependent. In scientific statistical 
analysis the variables compared are listed as dependent and in- 
dependent. The dependent variable is the one for which values 
are to be computed from related values of the other variable. 
In computing ratios the independent factor should be used as the 
base. A common ratio of this type is pounds of weight per height 
in inches for an individual. Dollar income per dollar of sales, the 
number of theater-goers per 1,000 of population, the ratio of im- 
ports to exports, and the age of wives as a ratio of the age of hus- 
bands are examples of this relationship. It does not require a 
degree of causation, although such a relation may exist. All that 
is necessary is that the factor considered to be the independent 
variable shall be the base of the ratio. 

RATIOS BETWEEN LIKE ITEMS 

In order for a comparison to yield a meaningful, logical and 
significant result, the two quantities compared must be of the 
same quality or kind and must be stated in identical or comparable 
units. In other words, the variables compared must differ in 
quantity only as measured in identical units. The simplest examples 
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of this requirement of ratios are those units stated in arbitrary 
common denominate numbers such as inches, pounds, feet, tons, 
dollars, gallons, barrels, etc. How many inches make a pound? 
is a meaningless question. How many gallons make a dollar? 
is equally foolish. But the number of pounds of cured pork ob- 
tained per 100 pounds of hogs slaughtered is an important and 
significant ratio. The number of gallons of gasoline obtained per 
100 gallons of crude oil run to stills is a legitimate ratio and very 
valuable as a measure of oil-refining efficiency. The number of 
tons of coke obtained per 100 tons of raw coal placed in the coking 
ovens is a useful and correct ratio. The number of dollars earned 
as income per dollar or 1 00 dollars invested is an accounting ratio 
that is universally employed and understood. 

In death rates and birth rates the comparison is between per- 
sons and persons. The number of persons or individuals dying in 
a given time period is compared with the number of persons or 
individuals living during that period. The vitality of cabbage 
seeds or beet seeds is the ratio of seeds germinating per 100 seeds 
planted. The accuracy of shooting or bombing is measured by 
the ratio of the shots or homhs that hit the target per 100 shots or 
bombs fired or dropped. In all the cases listed above and in many 
others which the reader will readily call to mind the ratio is a 
simple statement of the relation between two quantities both 
measured in identical units, pounds to pounds, tons to tons, 
dollars to dollars, person to person, seeds to seeds. Such simple 
ratios are easy to understand and to use. 


RATIOS BETWEEN UNLIKE UNITS 

Business, education, science and the activities of life generally 
are so complex that we are frequently compelled to make com- 
parisons between quantities which are measured in unlike units. 
How many inches make a pound? may be a foolish question, but 
the question, how many inches of sausage make a pound or how 
many inches of quarter-inch rope make a pound is sensible. All 
per capita ratios involve this double unit type of comparison. 
Dollars of income per person, bushels of wheat consumed per per- 
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son, gallons of gasoline used per person or per car, or per mile or 
100 miles, pounds of beef consumed per person per year, pairs of 
shoes purchased per person per year, kilowatts of electricity used 
per person per year, or per light, or per refrigerator, the hours of 
labor per automobile produced, or house painted, or acre of land 
plowed, or children taught in school, or the tons of coal burned per 
train mile are only a few of the most common ratios of this type. 

The device by which such ratios are made meaningful and 
valid is a common denominator. Inches and pounds, hours and 
miles, dollars and tons and other conglomerates of mixed units 
cannot be compared as such. The common denominator used to 
join such diverse units in a significant ratio is the idea, number 
or QUANTITY. The full statement of such a diverse ratio is the 
number of bushels of wheat per the number of persons consuming 
wheat. The number of gallons of gasoline consumed per the 
number of 100 miles traveled is the full and coiTect form of the 
ratio. The number of tons of coal burned per the number of train 
miles is the real ratio. The actual comparison is not between tons 
and miles, but between number and number, or quantity and quan- 
tity. Such comparisons are as logical and valid as ratios between 
identical units, because the numerator and the denominator of the 
fraction are identical. Both are numbers. This simple device of 
a common denominator makes possible the use of hundreds of the 
most important ratios which otherwise would be meaningless. 


( 1 ) 


Number of gallons of gasoline consumed 
Number of miles traveled 


is as valid a ratio, as 

Gallons of gasoline obtained 
Gallons of crude oil run to stills 

for in the second case the ratio is between 

Number of gallons of gasoline obtained 
Number of gallons of crude oil run to stills 

In both cases the ratio is between 

number 


number 
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In fact most of the ratios in common use of the diverse type depend od 
the common denominators of number or quantity. The 

Number of children born 
Number of women of child bearing age 

is a correct and necessary measure of the birth rate. 

The three points for all makers and users of mixed ratios to 
keep clearly in mind are : 

(1) The units and limitations of the numerator, 

(2) The units and limitations of the denominator must be clearly ex- 
pressed in the ratio, 

(3) The relationship between the numerator and denominator must 
be clearly stated. 

The standardized birth rate is a ratio which will illustrate these points. 

Number of live births per year 

Number of women 15 to 45 years old at the middle of the year in lOOO^s 

Point No. 1. The numerator is defined to exclude still births. 
The period measured is limited to a year. No distinction is made 
between races, or national origins. These and other distinctions 
could be made. If it is intended to make them they should be 
clearly stated. 

Point No. 2. The denominator is defined to exclude all women 
under 15 years of age and all over 45. The further restriction that 
these women are of these ages at the middle of the year is made. 
The age limit might have been placed as of the end of the year or 
of the beginning of the year. Whatever social or national re- 
striction is placed on the numerator must also be placed on the 
denominator. 

Point No. 3- The ratio between the numerator and denom- 
inator is stated in terms of 1,000’s of women of child bearing age. 
A ratio of 141.7 would mean that on the average 141.7 children 
were born during the year for every 1,000 women between the 
ages of 15 to 45 inclusive. 
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AVERAGING RATIOS 

Ratios cannot be correctly averaged by the simple average 
used for absolute numbers. The average of the heights of ten 
men would be found by adding the ten heights and dividing by 
ten. The average of the ratios of labor turnover for factories 
could not be accurately determined by this method, because of 
the difference in the number of employees and separations for the 
various plants. A weighted average must be used in which the 
original data are used as weights. 


Factory 

Total No. of 
Separations 
per Year 

Annual Average 
No. Employed j 

Ratio of Labor 
Turnover 

1 

150 

460 

32.6 

2 i 

792 

3,765 

21.0 

3 

2,902 

3,658 

79.3 

Totals 1 

3,844 

7,883 

Mean 48.7 


The individual ratios are 


1 . 

2 . 

3. 


150 

460 

792 

3,765 

2,902 

3,658 


32.6 


= 21.0 


= 79.3 


The ratio of the totals is 


3,844 

7,883 


48.7 


If the separate ratios were already available, the same total result 
could be obtained as follows: 

32.6 X 460 = 150 

21.0 X 3,765 = 792 

79.3 X 3,658 = 2,902 
7,883 = 3,884 


O 004. 

== 48.7% Weighted Average 

7,000 




POPULAR RATIOS 


47 


A simple arithmetic average of the three ratios is 44.3%. 

32.6 

21.0 

79.3 

3)132.9 = 44.3 Incorrect Average 

If one does not wish to use the large numbers of the original 
denominators, he may reduce them to percentages of their total 
and use these percentages as weights. 


Numbers 

Percentages 

460 

6 

3,765 

48 

3,658 

46 

7,883 

100 


Ratios 

Percentage Weights 

32.6 

X 6 = 195.6 

21.0 

X 48 = 1008.0 

79.5 

X 46 = 3657.0 


100 = 4860.6 


4860 

100 


- 48.6 


This figure, 48.6, is almost identical with the weighted average 
of 48.7 obtained above. 


POPULAR RATIOS 

There are many ratios in general use with which the student 
may be familiar which will illustrate the practical value of ratios 
ki the analysis of quantitative data. Since a ratio is the simplest 
numerical comparison which can be made between two quantities, 
there is a wide use for them in all statistical studies. The student 
should become familiar with those in common use. 
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Accounting Ratios 

1. Current Ratio == 

Dollar value of current assets 
Dollar value of current liabilities 

The current ratio is the total current assets divided by the 
total current liabilities. For the ordinary trading concern a 
ratio of 2 : 1 is usually considered safe. The following figures 
from the Sears, Roebuck and Co., January 31, 1941, will illus- 
trate this ratio 

Current Assets _ $233,447,252 _ o o 
Current Liabilities $61,559,008 

2. The Net Worth Ratio = 

Dollar value of Net Worth 
Dollar value of Total Debts 

A safe ratio would vary with industries. For railroads the ratio 
might be 1 : 1, but for manufacturing concerns it should be or- 
dinarily more than 2:1. For uncertain trading firms it might 
well be 10 : 1. This ratio may be illustrated by the figures from 
the Marshall Field & Co., Chicago, 1940. 

Net Worth $52,760,522 
Total Debts $37,510,284 

3. Fixed Assets Ratio = 

Dollar value of Net Worth 
Dollar value of Fixed Assets 

This ratio measures the security of creditors. If the net worth 
is large in proportion to fixed assets, credits are in a more secure 
position than when the ratio is low. The balance sheet of Mont- 
gomery Ward & Co., January 31, 1941, will illustrate this ratio. 

Net Worth ^ $210,252,211 
Fixed Assets $49,848,028 

4. The Turnover Ratio = 

Dollar Value of Sales 

Dollar value of Merchandise (or inventory) 
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This ratio measures the speed or efficiency with which inven- 
tory is turned into sales. In a chain grocery store the entire stock 
may be turned over 25 to 30 times a year. In a furniture store 
the ratio may be only 1 : 1 or even 1 : 2. The figures of Sears, 
Roebuck and Co. for the year ending January 31, 1941, will illus- 
trate this ratio. 

Sales _ $617,414,267 _ 

Inventory $129,212,482 

5. Collections Ratio = 

Dollar value of Credit Sales 
Dollar value of Receivables 

If this ratio is high, collections are good or sales are for cash. 
If this ratio is low, collections on credit sales are poor and cash 
sales are small. 

There are many other accounting ratios of great aid to the 
business manager which cannot be included here. These indicate 
how large a part simple ratios play in the management of even the 
smallest firm. 

Agricultural Ratios 

1. The Corn-Hog Ratio = 

Dollar value of 100 lbs of Live Hogs 
Dollar value of 1 bushel of Corn 

Since it requires on the average about 11 bushels of corn to 
produce 100 lbs of live hog weight, the corn-hog ratio should be 
about 11. During the past thirty years it has ranged from 7 to 
17 but has fallen between 9 and 12 about two-thirds of the time. 
When the ratio is high it pays to raise hogs. When the ratio is 
low it pays to sell corn. 

2. Farm Tenancy Ratio = 

Total No. of Tenant Operated Farms in Area 
Total No. of Farms in Area 

This ratio is usually reduced to a percentage. The ratio is as 
low as 3% to 4% in counties of New England and as high as 90% 
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in Mississippi. It measures the percentage of farms that are 
operated by tenants. The figures for 1940 are as follows for 


Maine 


Tenants _ 2.519 
Total Farms 38,980 


N.H. 

1.054 ^ 
16,554 ■“ 


6.3 


Bolivar County 
Mississippi 


10.843 

12,005 


90.3 


Economic Ratios 

1. Man-Land Ratio = 

Total Population in Persons 


Total Land Area in Square Mile 
or for the United States 

Population 131,669,275 


Area 


3,226,789 


43.5 population per sq. mi. 


This is an important ratio in measuring the ratio of resources 
to population, or population density. The Man-Land Ratio may 
also be stated as the number of acres per person as: 


Man-Land Ratio = 

Total acres of Tillable Land 
Total Population of Area 

34,281,948 


or for 


Iowa, 

Indiana, 
Rhode Island, 


2,538,268 

19,444,456 

3,427,796 

233,549 


= 13.5 acres per capita 
= 5.7 acres per capita 
= 0.3 acres per capita 


713,346 

2. Efficiency Ratios = 

rn Product 

^ ^ Input in Power 

This is an engineering ratio with an economic implication. 
Dollar Value of Products Sold 


( 2 ) 


Dollar Value of Production Costs 
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This is the economic ratio measuring the profits of business, or 
the efficiency of the business as measured by profits. The figures 
of the Chrysler Corporation for 1940 illustrate this ratio. 

Sales _ $744,561,239 
Costs $633,606,187 

3. Net Labor Turnover = 


Total number of replacements for period 
Average working force for period 


X 100 


This ratio may be illustrated as follows, for the A & B Rubber 
Company. 


Net Labor Turnover = 


Replacements for Month 
Average Working Force for Month 


X 100 


782 

8265 


X 100 = 9.4% 


Educational Ratios 

The best known educational ratio is, perhaps, the Intelligence 
Quotient, IQ. It is stated as follows: 

I IQ = Mental age of pupil in years ^ 

^ Chronological age of pupil in years 

If the mental age is 12 years, while the chronological age is 10, 
the IQ = T§ N 100 = 120. If the mental age was 9 years, while 
the chronological age was 11, the IQ would be ji X 100 = 82. 

^ ^ Test Score 

2. A.Q. - 

The Achievement Quotient is the measure of accomplishment 
as compared with ability. It measures how well the student ap- 
plies himself to the task. 


Population Ratios 


1. Crude 
Death = 
Rate 


Number of deaths occurring in the area 

during the year ^ 

Number of people living in that area at mid- 
point of that year 
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This ratio is called the Crude Death Rate because it makes no 
allowances for errors, the varying age and sex composition of 
different populations. While it may be quite accurate for some 
areas, in other areas the error might be large. The Refined Death 
Rate would as far as possible eliminate these errors. The ratio is 
multiplied by 1,000 to express the rate as a whole number instead 
of a fraction. Other death rates are the corrected, adjusted,’’ 
and ^^standardized.” 

For Wisconsin in 1939, this ratio is. 

Crude Death Rate for 31,424 - inn 

Wisconsin for 1939 3,137,587 ^ 


2. Crude 
Birth = 
Rate 


Number of births occurring in the area dur- 

ing the year ^ ^ 

Number of people living in the area at the mid- 
point of that year 


This ratio is called the crude birth rate because the number of 
births does not depend on the total population but on the number 
of women of child-bearing age. A given population might have 
a very low birth rate only because most of that population con- 
sisted of old persons. For Ohio in 1939 this ratio is, 


108,888 

6,907,631 


X 1000 = 15.7 


3. General 
Fertility = 
Rate 


Number of births to women of all ages 
Number of women 20-44 (or 15-45) 


X 1000 


This ratio may be varied to cover only specific age groups or 
specific race, residence, nativity, married, or other groups. It is 
a much more accurate measure of the fertility of a given popu- 
lation than is the Crude Birth Rate. For Minnesota in 1939 
this ratio was. 


General Fertility Rate _ Number of Births ^ 
for Minnesota Number of Women 15 to 44 


50,237 ,, 
634,329 ^ 


1000 = 79.2 
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$7.20 


4. Sex _ Number of males in population of given area ^ 

Ratio Number of females in the population of same 
area 

1 24 - 7 ^^ 

For Wyoming this ratio for 1940 is X 100 = 123.8 

lUU, / oU 

For Connecticut in 1940 it is = ^8.9 

ooy,oiy 

Railroad Operating Ratios 

1. Revenue for freight train mile = 

Operating freight revenue in dollars for period 
Total number of freight train miles for period 

A train-mile is the movement of a train for one mile. Since 
trains vary in length and load, the ratio represents the relation- 
ship only as an average of all trains. 

$3,537,149,471 

491,127,000 

(Source: Statistics of Railways in U,S., 1940, Tables, 86, 155, 165.) 

2. Revenue per ton-mile = 

Operating freight revenues in dollars for the period 
Total number of ton-miles hauled 

A ton-mile is a ton of freight hauled one mile. It is a uniform 
standard unit. Since some freight pays a much higher rate than 
other kinds of freight, the ratio represents the relationship only 
as an average, 

$3,537,049,471 , .. ....r 

376,368,718,000 = 

(Source: Statistics of Railways in U,S.j 1940, pp. 154-157.) 

3. Traffic Density Ratio = 

Total net ton-miles per year 
Total miles of track operated 

375,368,718, 000 , ... , 

362,078 1,036,706 tons 

(Source: Statistics of Railways in U-S.^ 1940, Tables 1-A, 86, 155.) 
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SUMMARY 

1. A ratio is a quotient, or a fraction. 

2. A percentage is a quotient times 100. 

3. The base of a ratio is the denominator of the ratio fraction. 

4. Ten of the most common bases of ratios are (1) total to total, (2) total 
to part, (3) part to part, (4) average to part, (5) former time to present 
time, (6) standard area or distance to given area or distance, (7) standard 
unit (school, family, class, etc.), (8) arbitrary units as 1, 10, 100, 1000, 
etc., (9) cause to effect, (10) independent variable to dependent. 

5. Ratios may be correctly combined or added only when they are 
weighted by the individual ratio denominators. 

6. Both the numerator and the denominator of a ratio must be clearly 
defined as to: 

(1) The units and limitations of the numerator, 

(2) The units and limitations of the denominator, and 

(3) The relationship between the numerator and denominator must 
be clearly stated- There must be a common unit of comparison 
between them, such as quantity or number. 
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REVIEW QUESTIONS 

1. What is a ratio? What is the difference between a percentage and 
a ratio? 

2. Explain what is meant by the base of a ratio. 

3. Explain the relationship expressed in each of the following types 
of ratios: 


(a) Total to total 

(b) Part to total 

(c) Part to part 

(d) Average to part 

(e) Former time to present time 


(f) Standard area ratios 

(g) Standard distance ratios 

(h) Arbitrary units ratios 

(i) Cause and effect ratios 

(j) Dependent to independent ratios 


4. Explain fully how ratios may be set up between unlike units. 

5. Explain how ratios may be averaged. 
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6. Give examples of the following ratios: 

(a) Accounting ratios (d) Educational ratios 

(b) Agricultural ratios (e) Population ratios 

(c) Economic ratios (f) Railroad operating ratios 

EXERCISES 1 


1. Compute the man-land ratios for: 


State 

Area 

Sq. Mi. 

Population 

1940 

New England 

South Atlantic 

Mountain States 

New York 

California 

Nevada 

63,206 

268,431 

857,836 

47,929 

155,652 

109,802 

8,437,290 

17,823,151 

4,150,003 

12,470,142 

6,907,387 

110,247 

2. Compute the ratio of population to acres (population as base) of 
farm land: 

State 

Population 

1940 

Farm Lands 

Acres 

Ohio 

Connecticut 

Wyoming 

Kansas 

6,907,612 

1,709,242 

250,742 

1,801,028 

20,888,004 

1,289,134 

522,936 

34,902,226 

3. Compute the death rate for: 

City 

Population 

1940 

Deaths 

1939 

Kansas City, Missouri 

St. Louis, Missouri 
Minneapolis, Minnesota 
Detroit, Michigan 

399,178 

816,048 

492,370 

1,623,452 

4,763 

9,973 

4,827 

13,249 

1 As was indicated in the foreword to teachers, the author has in most cases 


omitted detailed instructions to the student as to what to do with the data 
in the review problems. It is desirable for the teacher to make the specific 
assignments as to what detailed computation the student shall make accord- 
ing to the teacher’s particular purpose in each assignment for each class. 
These, of course, may vary from semester to semester and from class to class. 
By this method the review problems become tailor cut to fit the students’ needs. 
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4. Compute the crude birth rate for: 

City 

Population 

1940 

Births 

1939 

St. Louis, Missouri 
Kansas City, Missouri 
Minneapolis, Minnesota 
Detroit, Michigan 

816,048 

399,178 

492,370 

1,623,352 

14,202 

6,425 

9,149 

27,969 

5. Compute per-capita sales for: 

City 

Population 

1940 

Retail Sales 
1939 

Atlanta, Georgia 

Chicago, Illinois 

Los Angeles, California 
Denver, Colorado 

Buffalo, New York 

275,294 

3,396,808 

1,504,277 

322,412 

575,901 

169.298.000 
1,514,829,000 

782.842.000 

177.963.000 

250.311.000 



CHAPTER 4 


ORGANIZATION OF A STATISTICAL 
PROBLEM 


A statistical study is always set up to answer some question. 
The query must be one which can be answered by a quantitative 
measurement. How many or how much is the only reply a sta- 
tistical study cm give. Such questions as, What is courage; 
Hbwliburageous is he? What is virtue? How wise was Solomon? 
How great a general was Washington? cannot be answered by 
statistical methods. A question, to be suitable for a statistical 
problem, should be stated in a form in which it can be answered 
by absolute or relative numbers. Such questions are. What was 
the average yield of wheat per acre in Sedgwick County, Kansas, 
in 1941? What is the cost of producing coal in Harlan County, 
Kentucky? What was the birth rate in Quebec, Canada, in 1940? 
What is the life expectancy of school teachers at the age of 25? 
What was the average daily sales of variety stores in 1942? 
What is the average hfe of creosoted pine railway ties? What is 
the relationship of exports of cotton to the price of American 
cotton at New Orleans? What change in the price level occurred 
from June, 1933, to June, 1937? 


A DEFINITE EXPLICIT QUESTION 

Not only must a statistical problem be designed to give a nu- 
merical or quantitative answer to a query, but it must also be 
designed to answer explicitly a specific and delimited question. 
Such a question as, What is the price of eggs? is not a good question 
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to be the basis of a statistical investigation. It is much too in- 
definite and general. It gives rise to many other questions, such 
as, The wholesale or retail price? In Chicago, New York, or on 
the farm? fresh eggs or storage eggs? American-produced eggs or 
imported eggs? white or brown? by the dozen or by the pound? 
in July or December? and a score of other subsidiary queries 
which must all be decided upon before a statistical answer could 
- be given. The problem to be solved by a statistical study should 
be definite as to (1) time (2) place or area (3) statistical unit 
and (4) methods or measures required. 

Time. It is always necessary to be explicit as to whether the 
problem for solution is posed at a point in time or is to cover a 
specific period of time. The average and range of egg prices 
might be measured for retail stores as of a point or instant of time, 
perhaps as of 10 a.m. on Monday. Egg prices might be measured 
as to change over the period of a year or even several years. The 
two questions are quite different and require very diverse methods 
and answers. 

Place or Area. Egg prices in Kansas -will be lower than in 
New York City or even in Chicago, because of transportation 
and storage costs. Wheat yields in Texas may be quite different 
from those in Dakota or Saskatchewan. The cost of living may 
be much less in a rural area than in Boston or Washington. In 
any case the area to be covered by the study must be clearly 
stated and delimited. 

Statistical Unit. One of the most important decisions to be 
made by the statistician is the unit of data to be secured and 
measured. At first thought this might seem to be an easy or even 
unnecessary step. 

The unit of measurement applied to the data in any particular 
problem is the statistical unit. The units to be used in many 
studies are determined by custom or law and are a matter of 
general knowledge and are clearly understood. Such statistical 
units are the common physical units of measurement such as the 
pound, ton, inch, foot, yard, mile, hour, day, year, hectare, meter, 
kilowatt, horse power, and light year. In most cases to name the 
physical unit is sufficient explanation. There are some terms. 
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however, used with more than one meaning such as ton, bushel, 
and pound. A ton, for instance, may be a long ton of 2240 pounds 
or a short ton of 2,000 pounds. A bushel may be measured by 
cubic inches or by weight in pounds. For corn on the cob a bushel 
is 70 pounds, but for shelled corn it is 56 pounds. A pound may 
be avoirdupois, or 16 ounces or 7,000 grains, or it may be troy 
weight with twelve ounces and only 5760 grains. Whenever there 
is a diversity of physical units of the same name, the particular 
one used to measure the data at hand should be fully designated.^ 

Im jnany statistical studies no customary or legal unit is avaihy 
able. I A such cases the statistician must arbitrarily create and; 
define the unit to be used. This situation arises most frequently 
in studies in the social sciences. For instance, in investigating 
workmen’s wages the question arises as to whether we shall meas- 
ure wages by the year, month, day, or hour. There is much to be 
said for and against each unit of measurement. The amount of 
income received in a year sets definite limits to the level of living 
which may be maintained. A man who gets $5.00 a day biit works 
every day in the year may have an income of $1800.00, but an- 
other who is paid $1.50 an hour but is idle much of the time may 
have an annual income of only $1200.00. On the other hand, 
hours are of equal length and easy to compare, l;)ut work days 
vary from 5 to 10 or more hours and work weeks from 30 to 66 or 
more hours, while work months vary from 20 or less to 30 days., 
Which unit of wage income should be used? The statistician 
must define the unit of data before he begins to collect his data. 
In each particular problem the unit of measurement must be 
uniform throughout the study. 

Specific Definition of Unit. If a study is to be made of farm 
families, it is necessary to define clearly and explicitly what a 
farm family is. Is a farm family any family living on a farm 
whether it is engaged in agricultural work or not? Many families 
live on farms but work in cities, and have all the farm work done 
by tenants or hired managers and laborers. As far as economic 
production is concerned such a family is as urban as though it 
lived in a city hotel or apartment. The automobile, modern 
highways, and high city rents have caused many urban workers 
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to rent or buy homes in the country. Does living in the country 
make such a home a farm family? The statistician must decide 
before he begins collecting his data. If urban workers living in 
the country are defined as not farm families they will be auto- 
matically excluded from the data. If, on the other hand, all 
families living in villages or towns who go out daily to their farms 
to work and who are engaged in agriculture to the exclusion of 
urban labor are defined as farm families, they must be included 
in the data collected. The Federal census actually includes all 
families living on farms as farm families. The definition could 
be stated either way or in many other forms, but the point of 
emphasis is, that whatever the decision is, the definition must be 
so crystal clear that no one can misunderstand its meaning or de- 
limitations. Full understanding of the field of data studied and 
accurate logical thinking are essential to the clear definition of a 
statistical unit. 


' Illustrative Problem Number One 

^ How many inches high should the seats be for first-grade school 
children? We have all observed little children sitting in chairs 
made for adults and have noted their evident discomfort. Since 
physical comfort is essential to maximum mental concentration, it 
is essential that the seats in the school room be of the correct 
height. What height is correct? This is a question to be answered 
by a statistical study. It could be approached in two ways. 
Physicians and insurance companies have height-age-charts which 
show the expected average height, or normal height, of children 
for all ages from 1 year to 16 years. Since it is a matter of knowl- 
edge from previous studies that a seat for any person should be 
approximately one-fourth his height, one could divide the average 
height of six-year-old children as shown on the height-age-chart 
by four and quickly ascertain the height at which the first grade 
seats should be placed. This method is based on secondary data, 
which is data already in existence and collected for other purposes. 
The dependability of the results obtained from such studies would, 
of course, rest on the reliability of the secondary data used. The 
statistician should check all these points thoroughly, otherwise 
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his conclusions may be false. The detailed methods used in 
gathering and testing secondary data are treated in the following 
chapter. We are interested at this point only in the method to be 
used in setting up a statistical problem. 

Use of Secondary Data. When any statistical question is 
posed, the logical first step is to discover whether an adequate 
answer or even a partial answer is already available. If it is, no 
further time, labor, or expense is required. In most cases, per- 
haps, some usable information will be discovered, but it may be 
insuflBcient for a conclusive answer. It may be out of date, in 
improper form, insufficient quantity, from the wrong area, indef- 
inite, mixed with extraneous materials, or simply incorrect. When- 
ever adequate secondary data can be found they should be 
studied for their answer to the question or at least for their sug- 
gestions and implications as to what is lacking or required. It 
is a foolish waste of time and money to duphcate adequate answers 
to present questions. 

Primary Data. If the data already available are insufficient 
to answer the question, new data must be procured. Primary 
data are data obtained first hand by the investigator for a specific 
problem. The first step in gathering primary data is to deter- 
mine exactly what data are required for the purpose at hand, and 
to define the unit of data. In the case of the seats for first-grade 
children, the data should be the measures of the heights of a 
sufficient number of six-year-old children, or measurements in 
inches of the length of their legs from the floor to the knee. If 
the entire height of the child is the statistical unit, this figure 
must be divided by some number which will reduce total height 
to seat height. If such a dependable ratio is known, this method 
would be adequate, but if there is doubt as to the reliability of 
such a ratio, a better result would be obtained by measuring the 
actual length of the children's legs from the floor to the knee. 

Selecting the Sample. If knee-height-in-inches is chosen 
as the statistical unit, the next question to be decided is how many 
children should be measured and which children. A detailed ex- 
planation of sampling is presented in Chapter 14. All that we 
wish to emphasize at this point is that in laying out a statistical 
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study, the question of sample size and characteristics must be 
decided upon after the statistical unit has been defined and before 
the data are collected. One might decide to use all the children 
of only one first-grade class. For several reasons this might be an 
inadequate sample. A small class of ten or twelve children prob- 
ably would be insufficient. If this class contained only children 
who were large or small for their age, or were from a racial group 
shorter than the average or taller than the average of the total 
population or contained crippled children, the results would not 
be valid for general use. For securing a general rule for seat 
height the sample should cover a larger population and be based 
on random unbiased sampling. The end to be secured in sampling 
is to obtain the best possible representation of a large population 
by a small portion of that population. This might be done by 
taking at random five children out of ten, or perhaps more, first- 
grade rooms of a city school system. 

Methods of Analysis. Most of this book, from Chapter 9 
to Chapter 24, is devoted to theories and methods of analyzing 
statistical data. At this point it is not desirable to anticipate 
those detailed analyses, but merely to point out that the methods 
to be used will necessarily determine to some degree the type of 
statistical unit to be used and the type and quantity of data to be 
collected, and, therefore, that the methods of analysis to be em- 
ployed in a particular study should be largely decided before any 
data are collected or the study begun. One point which the stu- 
dent must fix clearly in mind is that all statistical studies should be 
planned carefully before the data are gathered or the analysis attepipted. 


Summary of School Seat Study 

1. We might select Cincinnati, Ohio, as a city of typical Amer- 
ican population for the location of the study. 

2. Our statistical unit would be the length of the legs of six- 
year-old children measured from the soles of their feet to their 
knee in inches. 

3. Our sample would be five children taken at random from 
twenty first-grade rooms of six-year-old children by giving each 
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child in each room a number and then selecting by lot five num- 
bers from each room, or 100 in all. 

4. The analysis ^vould consist of (1) computing a simple average 
for the 100 items and (2) the appropriate measure of the scatter 
of the items about their mean. 

Illustrative Problem Number Two 

A large nursery which supplies shade trees for St. Louis de- 
sires an answer to the question, What are the five most numerous 
varieties of shade trees in St. Louis and what percentage of the 
total of all shade trees does each of these five varieties compose? 

The plan for the study is as follows: 

1. Include in this study the entire metropolitan area including 
all adjacent suburbs. 

2. Make the study in June when all trees are in full leaf. 

3. Mark on a detailed map of the city every tenth block east 
and west beginning with the blocks bordering on the west side of 
Kings Highway and its most direct extensions to the north and 
south ends of the city. Mark on the map every tenth block 
north and south of Olive Street and its extension to the west in 
Washington Avenue from the Mississippi River to the west side 
of the metropolitan area. This would include one marked block 
in every 100 square blocks of the city. 

4. Count all the shade trees on all four sides of each of these 
selected city blocks, recording the number of trees in each variety. 

5. (1) Select from the tally sheets the five most numerous 
varieties. (2) Reduce the total for each variety to percentages of 
the grand total. (3) Multiply the grand total by 100 to obtain 
the approximate total of shade trees in the entire city. 

The above organization outline is a rough statistical approach 
to a problem which does not require results of great accuracy. 
It does, however, raise several questions. (1) What are to be 
considered separate varieties of trees? (2) Would the soft maple 
and the hard maple be designated as two separate varieties? 
(3) Would all oak trees be considered as one variety, or would 
red oaks be segregated from white oaks, post oaks, and jack oaks? 
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(4) What lines, if any, would be drawn among the various types 
of pine trees, and cedar trees? (5) Would trees of all ages be 
counted or only trees 10 feet tall and over? (6) Or would it be 
of more use to the nursery to count only young trees less than 
15 feet tall as indicative of present tastes in shade trees? (7) How 
far can one judge the age of trees by their height, since some are 
of very slow growth while others attain an equal height in about 
half the time? These and many other questions would have to be 
decided by the statistician before his research plan was perfected 
sufficiently to begin gathering data. One might even raise the 
question as to whether one square block in a hundred gave a 
sufficiently large sample to be representative of all parts of the 
city. Another vital query is whether the large commercial and 
industrial sections should be included in the study at all, since 
they would contain fewer trees and would be quite different from 
the residential areas. To exclude them would raise the difficult 
question as to when a section or block is residential or industrial 
or commercial. It is evident from these questions that those who 
plan such a study would have to give long and careful thought to 
its proper organization. 

Illustrative Prohlem Number Three 

As a means of aiding the Government to control prices more 
effectively after World War II, it is desired to create a weekly 
index number that will measure changes in the general price level. 

The organization of this problem should include the following 
points : 

1. Statistical Unit — the wholesale prices of commodities and 
services in dollars. 

2. Time — measured each week as of a certain day, as Friday, 
or the average of the high and low of the week. 

3. Area the United States — a certain number of selected 
cities of various sizes, and certain central markets. 

4. Method — a weighted geometric average of price relatives. 

In teaching statistics it seems desirable to put everything first, 
but that is impossible. The technical terms and methods men- 
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tioned in this chapter must wait their appropriate place in the 
body of the text for explanation. All that we wish to emphasize 
at this point is that the organization of a statistical problem re- 
quires : 

1. A thorough knowledge of the field in which the problem falls. 

2. An adequate knowledge of statistical theory and methods, 
and 

3. A thorough planning of the study as to all essential details 
and parts before the actual collection and analysis of data are 
begun. 

In the case of such an index a large number of preliminary 
questions would have to be settled. Should one use retail or 
wholesale prices? How many separate prices should be included? 
Should the prices be weighted, and if so, with what weights? 
Should the prices of services be included? Should one take the 
(1) average price for the six days of the week, (2) the mean of the 
highest and lowest price, or (3) the price as of a point of time, 
perhaps 2 p.m. on Friday? What should be the standard or base 
period? These are merely samples of the many details which 
would have to be thought out in planning the study. Many 
times students have come to the author to help them clear up 
some point in a statistical problem which could not possibly be 
answered with the data in the student^s possession because he 
had failed to include this particular point in the data when he 
gathered the information. Such a back-handed procedure is like 
putting a foundation under a house after the building is up and 
the plaster is on the walls. All information to be used in the study 
must be included in the original data. 

Illustrative Problem Number Four 

A farm experiment station wishes to test the effectiveness of 
two poisons sold in the market for killing cotton boll weevil. How 
should the experiment be set up? 

1. The statistical unit would be a boll weevil. 

2. The area would be a portion of an infected cotton field. 
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3. The time, preferably, August or a period when boll weevil are 
quite numerous. 

4. The method would be rather complex. It would require a 
carefully planned series of replications, or blocks of infested cotton 
bolls, laid out in strips or squares in such an order that both poi- 
sons would be applied to blocks or areas that were equally infested 
with the boll weevil. If the weevil were only one-half as numer- 
ous where one poison was applied as they were where the other 
poison was used, the results \vould not be comparable. To make 
the degree of the effectiveness of the poisons clear there should 
be a check block where no poison was applied, but in which the 
weevil were equally numerous. The theory and methods of lay- 
ing out such an experiment will be explained in their proper place 
in Part IV and are mentioned at this point only to emphasize 
the fact that the more exact and scientific are the results desired, 
the more exact and detailed must be the plans for the statistical 
study. 

The number of illustrations could be continued without limit, 
but these are sufficient to suggest to the beginning student the 
great importance of carefully planning all statistical studies. In 
developing the various statistical methods in succeeding chapters, 
the particular types of data which should be treated by particular 
methods are pointed out and emphasized in the discussions. 
The student should note these points and in planning any sta- 
tistical problem should select his methods with the same degree 
of deftness and skill with which the expert craftsman chooses 
and varies his tools according to the nature of his materials and 
the purpose of his operations. The scientific statistician is not 
one who rakes through the garbage pile of numbers and uses 
whatever he may chance to drag out. He is rather an architect, 
an engineer, who carefully designs a numerical structure for a 
specific purpose and selects his materials and methods to that 
definite end. It is well for the elementary student to begin with 
simple problems which lie within the range of his own experience. 
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SUMMARY 

1. A statistical problem should be set up to answer a specific question. 

2. It should be definite as to (1) time, either point or period of time, 
(2) place or area, (3) statistical unit, and (4) methods or measures re- 
quired. 

3. The statistical unit may be a common denominate number such as 
pounds, dollars, acres, days, etc., or it may be an especially or arbitrarily 
devised unit for a specific problem, as farm-family, ton-mile, work-day. 

4. In setting up a statistical problem, the statistician should review 
what has already been done in the field to determine (1) whether further 
study is required, (2) what additional data are required, (3) whether 
modifications and restatements of the problem should be made or whether 
it should be further delimited or expanded. 

5. Many questions should be asked and every phase of the problem 
should be explored and every important factor and variable should be 
clearly defined in its relation to the problem. A complete plan of pro- 
cedure should be worked out before the data are collected or their analysis 
begun. 
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REVIEW QUESTIONS 

1. What type of questions should a statistical study be expected to 
answer? Why? 

2. How definite and delimited should be the question for a statistical 
study? Why? 

3. What is a statistical unit? How is the statistical unit of a study 
determined? Explain. 

4. What steps should be included in the preliminary work for a sta- 
tistical study? Why? Explain. 

5. What work should be done before the data are collected? Why? 
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EXERCISES 

I. Outline a statistical study on one of the following questions: 

1. How many pairs of shoes of each size should a shoe store carry in 
stock? 

2. At what age on the average are automobiles junked? 

3. To what degree do high school grades indicate college achievement? 

4. Was R. E. Lee a greater general than U. S. Grant? 

5. Two fertilizers are recommended for yard grass or lawn use. How 
may one determine whether one is superior to the other? 

6. What is the average consumption of cigarettes for a given area, 
such as a city, county, or state? 




CHAPTER 5 

COLLECTING AND EDITING STATIS- 
TICAL DATA 


The nature and complexity of the question to be answered de- 
termine the kind and quantity of data to be collected. Data are 
classified as primary and secondary. Secondary data are those 
already in existence and which have been collected for some other 
purpose than the answering of the question at hand. Examples of 
secondary data are the accounting records of a store or factory. 
These records are collected in the routine of business to enable the 
accounting department to post the proper charges and credits to 
the appropriate accounts in order to send out statements, make 
collections, and prepare profit and loss statements, and balance 
sheets. But after these data are once in existence, many of them 
can be used later by other persons or agencies for other purposes, 
such as estimations of average daily sales, average daily selling 
expenses, and even budgets for future months or years. When 
used by others than those who collect them they are secondary 
data. Other examples of secondary data are the extensive cen- 
suses and other reports of the United States Government. These 
include the decennial census, agricultural census, census of manu- 
factures, treasury reports, reports on foreign and domestic com- 
merce, and hundreds of other publications. 

Primary data are those collected first hand by the statistician in 
order to answer the question at hand. All data are primary to the 
person or agency which first collects them. 

69 
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WHAT HAS ALREADY BEEN DONE? 

When a statistical question is raised, the first step in answering 
it is to determine whether an adequate reply is available. If it 
is, no further time or money need be spent. So much research 
has been done in so many fields during the past thirty years that 
many of the questions raised can be answered at once, if a person 
knows where to turn for the information. If a complete answer 
is not available at once, in many cases a sufficient amount of data 
can be discovered to indicate what kind of additional data are 
required and where they may be found. It is frequently dis- 
covered that a beginning has already been made and that all 
that is necessary is to complete it. Sometimjes a new regrouping 
of the old data is necessary but again a more complete analysis of 
existing tabulations is all that is required. In any case it is ad- 
visable to exhaust the possibilities of using existing studies, sec- 
ondary data, and earlier analysis before new data are collected. 
A review of previous studies is valuable in (1) furnishing useful 
data, (2) suggesting lines of attack on the problem, (3) revealing 
errors to be avoided and (4) indicating new or other sources of 
data to be used. 


THE USE OF LIBRARIES 

The most extensive single aggregate of secondary data is usually 
to be found in the libraries — city libraries, corporation or plant 
libraries, and college libraries. Such institutions are permanent 
depositories of governmental reports and publications as well as 
of the books and studies made by many other organizations, 
public and private. Since it is the business of a library staff to 
obtain all possible sources of data and other information for its 
patrons, libraries gradually collect vast stores of useful informa- 
tion which is only awaiting the hand of some one to use it. Large 
city libraries and the libraries of the Jarge universities and land- 
grant colleges are especially well stocked with all state and Fed- 
eral reports and publications. They also contain extensive files 
of metropolitan newspapers and trade magazines. The statis- 
tician, therefore, finds the libraries a rich source of data. 




UNITED STATES GOVERNMENT INDEXES 71 


The novice in the use of library materials will have to depend 
much on the aid of library assistants and clerks. While many of 
the library personnel are efficient in locating book and statistical 
materials, they usually will not know all the sources in which the 
obscure but valuable bodies of data are hiding. They may lay 
before you a series of records or references, but it will still be 
your responsibility to locate the specific data required, and to 
pass judgment on their adequacy and validity. It is entirely 
too much to expect library staffs to do your research for you. 

Since the research worker must depend on his own skill and 
judgment in large measure in locating and interpreting second- 
ary data, he should, as soon as possible, become thoroughly 
familiar with all the library data possible in his and in related 
fields. The best way to do this is for him to secure, if possible, a 
permit to enter the stacks or book storage section of the library 
where he may examine and study at leisure all possible sources of 
data in his field. In an elementary text there is no space for 
lengthy references to individual and specific books and sources, 
but the student should have at his command the principal in- 
dexes of sources in all the major fields of study. A principal part 
of his preparation to do statistical work is a clear knowledge of 
these indexes and how to use them. They are the dictionaries of 
statistical studies or research. 


UNITED STATES GOVERNMENT INDEXES 

The largest collector and publisher of statistical data in the 
world is the Government of the United States. At various in- 
tervals the Government publishes a number of very important 
and inclusive indexes of all these sources of data. The principal 
indexes are: 

1. Catalogue of the Public Documents of the Congress and all 
Departments of the Government of the United States. (Prepared 
under the supervision of the Superintendent of Documents) 

This is the basic index and the student should become thor- 
oughly familiar with its scope and general contents. 
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2. United States Government Publications — A monthly catalogue. 

This is the monthly supplement of Catalogue of the Public Docu- 
ments listed above and is consolidated at the end of the year into 
the larger volume. It is the current list of Federal publications. 

3. Price Lists of Government Publications. (Especially good for 
those persons ordering sources for themselves.) 

A few of its many fields of listings are: 


1 

No. 10, 

Laws 

2 

No. 11, 

Foods and Cooking 

3 

No. 15, 

Geological Surveys 

4 

No. 19, 

Army and Militia 

5 

No. 20, 

Public Domain 

6 

No. 21, 

Fishes 

7 

No. 24, 

Indians 

8 

No. 25, 

Transportation 

9 

No. 28, 

Finances 

10 

No. 31, ■ 

Education 

11 

No. 33, 

Labor 

12 

No. 38, 

Animal Industries 

13 

No. 62, 

Commerce and Manufacturing 

14 

No. 71, 

Children’s Bureau 


Dozens of other fields of data are included. 

4. Index to Publications of the United States Department of 
Agriculture. 

5. Publications of the Bureau of Standards^ 1901-1925. 

6. Supplementary List of Publications of the Bureau of Standards, 
1925-1931. 

7. List of Selected Publications (Issued by the Bureau of For- 
eign and Domestic Commerce) 

8. List of Available Publications — Miscellaneous Publications, 
No. 60. (Always carries same number, No. 60) 

9. United States Tariff Commission. 

10. Monthly Check List of State Publications; U.S. Library of 
Congress. 

These ten major United States Government indexes do not 
exhaust the list of Federal indexes or published sources of infor- 
mation. They are only the larger and more inclusive ones. The 
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student who is interested in securing data to answer a statistical 
question will find that librarians can secure for him many other 
minor sources of information as to the Government publications. 
All of these publications are only indexes of sources. 

Principal General Federal Publications 

In a general elementary statistics text only the most used and 
general Federal sources should be named specifically. These are: 

1. The Decennial Census (Beginning in 1790 and available for 
each succeeding decade up to and including 1940) 

Many libraries will have only the last three to five census re- 
ports, 1900, 1910, 1920, 1930, and 1940. Smaller libraries may 
have only one or two of the more recent reports. Few libraries 
will go back of 1860. These sources are invaluable for many pur- 
poses. The usual present subdivisions of the Decennial Census 

1. Population (very detailed and complete) 

2. Agriculture 

3. Distribution 

4. Retail Trade 

5. Unemployment 

2. The Census of Agriculture (Taken every ten years as of 1925, 
1935, etc). The figures for the agricultural census are also included 
in the Decennial Census of 1920, 1930, and 1940. 

3. Census of Manufacutures (biennial). It reports the number of 
establishments, employees, value of products and principal costs. 

Special Federal Publications 

1. Statistical Abstract of the United States. 

This is a general summary of statistical information in many 
fields. The data are arranged in table form according to class 
intervals and time periods. f 

2. Foreign Commerce Yearbook of the United States. 

A digest of all data on exports and imports of all commodities 
from and to the United States, from all foreign counties in terms 
of quantities and values. 
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3. Foreign Commerce and Navigation of the United States. 

Data on exports and imports of all principal products from all 

states and ports of the United States to all foreign countries. 

4. Industrial Market Data Handbook of the United States. 

Data on all principal manufacturing, processing, and mineral 

production in the United States by states and counties, giving 
number of plants, wages, cost of materials, value of products and 
other related information. 

5. Consumers Market Data Handbook of the United States. 

Data by states, counties, and cities on 82 series ranging all the 

way from population, types and volume of business, employment, 
retail distribution, incomes, telephones, automobiles, electric equip- 
ment and many other items. 

6. Financial Statistics of State and Local Government. 

Data on state, county and city revenues, debts and expendi- 
tures from various sources (an excellent source of information). 

7. Budget of the United States. 

Statement of expenditures for two past fiscal years and esti- 
mates for coming fiscal year in detail by departments, bureaus 
and section for items. 

8. Handbook of Labor Statistics. 

Data on employment, wages, living conditions, housing, health, 
income, industrial relations, migration of labor, productivity and 
related questions. 

9. Survey of Current Business. 

Published monthly by Department of Commerce and giving 
data on industrial production, income, cost of living, prices, con- 
struction, retail and wholesale trade, employment, transporta- 
tion and most other business activities. 

10. Market Research Sources (biennial). 

Published by Department of Commerce, giving quite complete 
information on all research projects and studies on marketing in 
the domestic market. 
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11. Monthly Labor Review. 

Published monthly by the Bureau of Labor SiaHsiics, giving 
data on employment, unemployment, wages, prices, levels of 
living and labor legislation. 

12. Agricultural Statistics. 

Published annually by the Department of Agr^cuUure containing 
9. wide variety of data on all phases of agricultural production 
and prices for the United States and foreign countries. 

13. Mineral Yearbook. 

Data on the production of all minerals in the United States 
and foreign countries in quantities and values in great detail. 

14. Statistics of Income. 

Published annually by the U.S. Treasury Department, contain- 
ing extensive data on all types of income by sources and geo- 
graphical divisions. 

15. Federal Reserve Bulletin. 

Published monthly by the Federal Reserve Board, containing a 
wide variety of data on all banking and monetary activities for 
the U.S. and foreign countries, besides many indexes on pro- 
duction, employment, wages, payrolls, and related business ac- 
tivities. 

There are scores of other statistical reports, summaries, analyses, 
and yearbooks published by the various departments, bureaus 
and commissions of the United States. The number is too great 
to include specific reference to all of them here. The student 
will find them all listed in the general indexes given above, 

NON-GOVERNMENT INDEXES 

All libraries will contain some or all of the following indexes on 
general statistical sources and studies. They are: 

1. Readers Guide to Periodical Literature (Published by the 
H. W. Wilson Co., 950 University Ave., N.Y.) 
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An index of 120 or more of the principal popular and scientific 
magazines, by author and title. An excellent service. 

2. Educational Index. 

Published five times a year. A general reference on all educa- 
tional publications. 

3. Industrial Arts Index (Chemistry, Physics, etc.) 

An index of over 250 leading scientific periodicals. Published 
annually. 

4. Public Affairs Information Service, H. W. Wilson Co. 
Published annually by Public Affairs Information Service, N.Y. 

5. Agricultural Index, H. W. Wilson Co. 

A general index on all periodical literature, books and bulletins 
on public affairs. 

6. The New York Times Index. 

Published monthly. A complete index to all articles and news 
in the New York Times. 

7. Cumulative Book Index, H. W. Wilson Co. 

A world list of books in the English language. 

8. International Index to Periodicals, H. W. Wilson Co. 
Devoted primarily to the Humanities and Science. A good 

service. 

9. Trade and Professional Associations of the United States. 
Government Printing Office. 


SOURCES IN SPECIAL FIELDS 

It is desirable even for beginning students to be acquainted 
with a few of the leading non-governmental sources in the prin- 
cipal fields of statistical studies. 

Business 

1. Moody ^s Investment Service, 65 Broadway, N.Y. 

Data on banks, financial companies, industrial corporations, 
railways, public utilities, stocks and bonds. 
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2. Standard & Poor's Corporation, N.Y. 

An extensive service supplying data on all types of business 
activities, industrial prices, production, wages, transportation, 
construction and other series. 

Economics 

1. The American Economic Review, Journal of the American 
Economic Association. 

2. The Quarterly Journal of Economics, 

Education 

1. The 1940 Mental Measurement Yearbook, Buros, Oscar K., 
The Mental Measurements Yearbook, Highland Park, New York. 

2. Personnel Bibliography Index, Cowley, W. H., Ohio State 
University, Columbus, Ohio. 

3. Educational and Psychological Measurements, Quarterly Sci- 
ence Research Associates, 1700 Prairie Avenue, Chicago, Illinois. 

4. Journal of Educational Research, Barr, A. S., Jr., Editor, Box 
21, Old Engineering Building, University of Wisconsin. (A 
monthly magazine established in 1920) 

General References 

The Annals of the American Academy of Political and Social 
Science. Concord, N.H. 


Psychology 

1. Journal of Applied Psychology, Porter, James P., Editor and 
Publisher, Ohio University, Athens, Ohio. (A bi-monthly journal) 

Sociology 

1. The American Journal of Sociology, University of Chicago, 
Chicago, Illinois, Herbert Blumer, Editor. 

2. Social Forces, The University of North Carolina, Chapel 
Hill, N.C., H. W. Odum, Editor. 

3. The American Sociological Review, Vassar College, Pough- 
keepsie, N.Y., J. K. Folsom, Editor. 
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4. Sociology and Social Research, University of Southern Cali- 
fornia, Los Angeles, California, E. S. Bogardus, Editor. 

5. Rural Sociology, University of North Carolina, Raleigh, N.C., 
Carl C. Zimmerman, Editor. 


Statistics 

Journal of the American Statistical Association. 


NON-LIBRARY SOURCES 

The student should not think that all secondary data are in 
libraries. The world is full of them. Secondary data are every- 
where. All accounting records of all business firms from the 
corner grocery store to the United States Steel Corporation, and 
General Motors are secondary data, to all persons other than the 
original collectors, and are frequently used by business firms for 
statistical purposes. 


PUBLIC RECORDS 

All the records in the many offices of the forty-eight state 
governments, including the secretary of state, the railway and 
public service commissions, the highway departments, the state 
public land commissions, the state treasurer's office, the depart- 
ment of education, the department of labor, the tax commissions, 
and all the many other state offices, bureaus, departments, and 
commissions contain large quantities of invaluable secondary 
data which usually can be made available to the researcher. 

The records of all the offices of our thousands of American cities, 
towns, and villages from New York City and Chicago down to 
our smallest communities contain many valuable secondary data. 
Besides these there are the extensive records of the more than 
three thousand counties in the United States, including data on 
lands, deeds, taxes, mortgages, vdlls, births, deaths, marriages, 
divorces, suits at law, school records, roads, delinquency, local 
relief, elections, and other economic, social, and political affairs. 
In addition to all of these there are the data accumulated in the 
official records of tens of thousands of local townships and hun- 
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dreds of thousands of school districts, cemetery associations, 
churches, and newspaper offices. 

These are all secondary data when used by other persons than 
those who originally collected them, and are valuable for statis- 
tical studies of many kinds. 


CHECKING ACCURACY OF DATA 

Since secondary data are second-hand data, collected by some- 
one else in the past for other purposes, it is always necessary for 
the statistician who uses them to check them as to accuracy and 
homogeneity. The points to be covered in checking such data are: 

1. Under what conditions were they collected and for what 
purpose? Did any special responsibility attach to the collector 
or the giver of the data causing him to use ordinary or special 
care in his work? Were the data gathered by those having first- 
hand knowledge of them, or were they based on hearsay or mere 
estimates? How dependable are they? 

2. Are the data collected from several areas or over a period of 
time all expressed in the same statistical unit? Has there been 
a change from long tons to short tons, or from bushels to hundred- 
w’-eights, from household to family, or in the definition of farm, 
or business firm, or mile of highway? In other words, are we 
comparing and adding units of the same size, quality, and measure- 
ment throughout the study? 

3. Are the data from the same area? This is of prime impor- 
tance for all spatial data. Frequently a county is divided, two or 
more villages consolidated, and the boundaries of wards, school 
districts, precincts, and even states altered. Farms, plantations, 
and business corporations are often consolidated or divided. Are 
the data for the entire study taken for the same or for altered 
areas? 

4. Are there omissions or duplications in the data? Were some 
of the records lost or were some of the facts never included in 
the records? Are the incomplete data available reasonably repre- 
sentative of the entire area or period, or do the gaps invalidate 
the measurements? In every case in which secondary data are 
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used, the statistician must satisfy himself as to their accuracy 
and if he uses them must make notations as to the breaks and dis- 
crepancies in them. The best way to check data for accuracy is 
to compare several sources if such are available. For instance, 
district school population figures may be compared with city 
directories and with the Decennial Census. Crop yields reported 
by various authorities may be checked against each other. Branch 
sales records may be checked against central office records. The 
parts may be compared with the whole. 

TESTING THE YALIDITY OF DATA 

Checking the accuracy of data means determining whether the 
facts are as represented. Testing the validity of data means de- 
termining whether the data are suitable for answering the ques- 
tions for which they are being used. This point may be illustrated 
with a few cases. 

Are the tax assessment records valid data for measuring the 
wealth of a country? No, for the following reasons: 

1. Not all private property is listed on tax records. Some 
portions of it for one reason or another escape tabulation. 

2. Not all property is listed at full value. An uncertain ratio 
or estimate would have to be used. 

3. Not all property is listed at the same ratio of its true market 
value. For instance, it is not uncommon for one piece of property 
to remain for years at the same value on the tax records although 
its market value may have doubled or quadrupled while another 
piece has declined in value with no adjustment on the records. 

4. In some states public utilities and railways are assessed by 
the state and do not appear on local tax records at all. 

5. Much of the wealth of any community is publicly owned, 
such as, highways, streets, bridges, public buildings, libraries, 
schools, hospitals, etc. 

These and other conditions make the tax data invalid as an 
accurate measurement of the country's wealth. The best that 
could be obtained from such data would be a very rough estimate. 
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Can the number of families in a community be determined 
from the number of school children? No, for the following reasons : 

1. Not all families have children in school. Such families would 
be missed. 

2. Not all families with children have the same number in 
school. Of two families with four children, one family might have 
only one child in school while the other might have all four chil- 
dren in school. 

3. No valid comparison could be made between communities 
with such data because some communities are composed more 
largely than others of old people who have no children in school. 

Are the charge account sales slips of a store valid data for the 
measurement of total sales? No, unless cash sales are added to 
them and the value of goods returned is subtracted. 

Are the number of deaths occurring in a city during a year 
valid data on which to compute the city’s death rate? No, for 
the following reasons: 

1. Many of the deaths may be of persons dying in city hospitals 
who come from outside the city and are no part of the local 
population. 

2. Some persons living in the city may die or be killed in high- 
way or other accidents outside the city. 

Only the deaths of persons living in the city at the beginning 
of the year would compose valid data for measuring the death 
rate of the area. 

The student may think of many other illustrations of the 
validity of data or the lack of it. He should keep clearly in mind 
that in every case in which secondary data are used, the statis- 
tician must determine carefully their validity for the use at hand. 
Second-hand data are often no better than second-hand clothing 
or automobiles. 


PRIMARY DATA 

Although secondary data are usually cheaper and more imme- 
diately available, they are of little value if they are not adequate, 
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accurate, or valid for the question at hand, and primary data 
must be obtained to solve the problem. All that has been said 
in the previous chapter on the organization of a statistical prob- 
lem applies with full force to the collection of primary data. The 
entire study should be carefully planned in detail, and accurate 
decisions made as to the statistical unit to be used, the time and 
the area to be covered, and the methods of analysis to be employed. 
Only after these plans have been perfected is it time to collect the 
data. 


THE SCHEDULE 

An essential part of the planning of the study is the organiza- 
tion of the schedule to be used in obtaining the data. A sta- 
tistical schedule is an organized logical sequence of questions for 
securing specific data for a given problem. 

Some Examples of Schedules 

The typical modern schedule is a compact set of questions ar- 
ranged in blocks so that they may be answered by (1) a check, 
>/, or X (2) a letter, as M, F, (3) a word, as yes, no, in, out, all, 
part, none, (4) a number, as 1, 2, $125, 5%, .06, J, (5) a name, as 
John Smith, Mary Brown, (6) a date, as December 7, 1941, 
January 1, 1942, or (7) at most by a very brief statement. 

The questions are designed to require answers which may be 
stated as quantities, or specific classifications, and may be tabu- 
lated in frequency distributions or exclusive classes. 

The actual size of Schedule I shown opposite is 8 X lOf inches. 
It contains eleven blocks of questions including in all 196 possible 
answers. Of this number 2 are dates, 8 are names, 10 are letters, 
26 are checks, 27 are words or brief phrases, and 123 are numbers. 
It is a good example of a brief schedule which exhibits a maximum 
of brevity, exactness, clearness, and objectivity. 

Nature of Questions 

The framing of the right kind of questions for a schedule re- 
quires much careful thought and considerable experience. Fre- 
quently the cooperation of two or more persons is desirable in 
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framing questions. If a question is so stated that it has the same 
meaning to several persons and elicits the same answer from all 
of them, it is likely to be a good question, definite, clear and ex- 
plicit. 

Question 2 of block VI in Schedule I (Family Schedule — 
Town or Village) above is an excellent question. It reads, ^'Does 
family .... own or .... rent these living quarters? It may be 
answered by one check mark. Own or rent are perfectly clear and 
.alternate concepts to even persons of less than ordinary intelli- 
gence. This question is not likely to be misunderstood or to be 
given an incorrect or confusing answer. If, however, it were 
, stated, ^^How does the family obtain its living quarters?,’^ the 
■evident alternative would not be so clear. Some might answer, 
“We own them,^^ others might reply, “We rent them,^’ but some- 
one might give the indefinite reply, “The place belongs to my 
father,” or “It belongs to the company and is included in our 
salary,” or “We have an option on it.” The statistician might 
have difficulty with the last three answers and, perhaps, would 
have to go back for more exact information. 

This question would be stiU worse if it were stated, “Explain 
how you get your living quarters.” The answer might be, “A 
real estate company really owns the house, but we are paying it 
out like rent.” This answer is equivalent to “own,” but he has 
cluttered up the problem with a lot of useless and extraneous in- 
formation which has no bearing on the main issue of “own” or 
“rent.” 

Many of the other questions in this schedule could be so stated 
that several sentences would be required to answer them. This 
inexactness of question and answer is the thing the makers of 
schedules must avoid. 

Schedule II is 13J X 19^ inches and contains 16 blocks of 
questions with a possible more than 400 answers. Only one- 
fourth of the schedule is shown below. Many others, both 
shorter and longer and better and worse, are available in cor- 
poration, research, and government offices. These are sufficient 
to stress the types of questions and kind of organization which 
should be followed. 




I ilxaEflL-Jiil 


SCHEDULE II 



85 



86 COLLECTING AND EDITING STATISTICAL DATA 


Testing Schedules 

After the schedule has been carefully organized and checked for 
omissions in the office and before large quantities are printed it is 
well to send an interviewer out into the field with a dozen or so 
to test it out under actual field conditions. It is very difficult to 
think out beforehand every point that should be covered. If 
anything has been overlooked this field test will likely bring it to 
light in the filling out of a few schedules. Actual field conditions 
frequently bring to view questions which it is difficult or impossible 
to anticipate. Such revisions as these practical tests reveal can 
then be made before the final schedules are printed. 

Instructions for Enumerators 

A complete set of detailed instructions should be prepared for 
the enumerators. It is always well to train schedule takers be- 
fore they begin their work, but in the field questions sometimes 
arise for which they have forgotten the answers, if they ever 
knew them. If they have a complete set of instructions covering 
every detail of data to be obtained and the various circumstances 
under which they may have to get the information, they can do a 
much better job. These instructions should usually be revised 
after the questionnaire or schedule has been tested under field 
conditions. 

THE QUESTIONNAIRE 

Schedules are usually filled in by trained interviewers who 
visit the persons or firms from whom information is desired and 
who are, therefore, able to explain questions which are not clear 
to the persons giving the information. This use of trained workers 
eliminates much confusion and most inaccuracies from the data. 
Many times, however, the expense of using trained interviewers 
to secure answers to the schedule questions through personal 
visits is too large to be justified by the probable value of the study. 
In such cases a less expensive method must be used. The cheaper 
method is usually the questionnaire. Sometimes, however, a 
large number of schedules may be collected at a small cost. This 
is usually true if the traveling costs are not large. The costs can 
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be computed only after the problem has been planned and the 
location and the quantity of the data required are known. 

A questionnaire is a related series of questions^ usually prepared 
on a paper with intervening spaces for answers, which is mailed to a 
number of persons from whom information is desired. Return post- 
age is provided for the reply. A trained interviewer usually can 
get answers to the questions on a schedule, but a questionnaire is 
its own and only advocate. It must be so simple and crystal clear 
that the person who is supposed to answer it can understand it^ 
with a minumum of effort and time. Most of the persons to whom 
questionnaires are sent are busy people who will throw any ques- 
tionnaire into a waste basket if it takes more than a few minutes 
to answer or is not perfectly clear, or pries into their private 
business. Questionnaires should, therefore, always be as brief as. 
possible, clear and reasonable. 

Statisticians who contemplate using questionnaires should Con- 
sider well their relationship to the persons who are supposed to 
answer the questions. These relationships will usually fall in one* 
of five classes as follows: 

1. Those who are under your authority, as salesmen under a sales 
manager, soldiers under a commanding officer, branches or sub- 
sidiaries under a central business office, teachers under a dean or 
principal, or even employees under a foreman. 

2. Those who are under some obligation, as debtors to a creditor, 
tenants to a landlord, clients of an insurance firm, etc. 

3. Those who expect to make a profit out of the relationship, as^ 
patrons of a store, a factory, a dispenser of services, an adver- 
tising agency, or an exporter. 

4. Those who expect to or wish to render some service, such as 
schools, churches, lodges, civic organizations, citizens, or patrons. 

5. Those whom you expect merely to accommodate you as a free 
favor, such as unknown persons or organizations which are in no 
way under obligation to you or whom you do not expect to pay 
or reward for their favors. 

These five classes are listed in the diminishing likelihood of your 
receiving a reply, or an adequate reply, from your questionnaire. 




88 COLLECTING AND EDITING STATISTICAL DATA 


Statisticians should usually refrain from sending questionnaires 
to persons or firms in the fifth class. 

Sometimes a questionnaire has an advantage over a schedule in 
that it is more secret and impersonal, and, therefore, the subject 
is more willing to give the desired information. 


EDITING THE SCHEDULES 

After the filled-in schedules or questionnaires have been re- 
ceived they should be edited before the data are used. Editing is 
a preliminary critical evaluation and checking process. It may 
be done by one or more persons, depending on the size of the 
project and the amount of technical knowledge and skill required. 
In small studies one person usually does it all. The points to be 
covered in editing usually are as follows: 

1. Check the schedule for any omissions. Sometimes the person 
filling in the schedule will accidentally overlook some question. At 
other times he may not know the answer and merely leaves the 
space vacant. In either case the data should be secured by a 
second effort, if possible, or the schedule, or that portion of it, 
thrown out. 

2. Checking for conflicts and contradictions. Frequently two or 
more questions are so related and interdependent that one is a 
check on the accuracy of the other. In making schedules and 
questionnaires such internal automatic checks should be care- 
fully designed. This may be done by checking parts against a 
total; two dates or a date and age against each other, or a class 
against a quantity, as in a schedule dated January 1, 1943: 

(1) Age — 40 years, 7 month. 

(2) Date of Birth, 4 day June Mo. 1912 year. 

In this case it is evident that an error has been made in either 
(1) Age, or (2) Date of Birth. The person is either 30 years old 
or was born in 1902. The schedule or questionnaire should be 
sent back for correction or discarded. 

3. Verify answers marked with check marks or letters as \/, X, 
F, M, in, out, and the like. For instance, if the name of the per- 
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son is John Smith and Sex is marked F a check or correction 
should be made. 

4. Look for errors in computations or subject matter. 65 + 12 
may be given as 87. Chicago may be listed as in Lake County 
instead of Cook County, and Kansas City, Missouri, listed as 
Kansas City, Kansas. 

5. Check for conformity among schedules. If the question of 
Amount of arable land on farm?’^ is answered for most schedules 

as including plowable pastures but excluding woodlands, it should 
be so answered on all schedules as to reach comparable results 
in the study. If the question, “Number of minor children?’^ is 
answered to include both boys and girls under 21 years in some of 
the schedules it should not be answered to exclude girls over 18 
in other schedules. 

After the preliminary checking is completed all schedules an- 
swered properly should be so marked and segregated and the in- 
complete schedules sent back for correction or eliminated frona 
the study. 

CODING DATA 

In modern statistical work in which machines are largely used^. 
much of the data is frequently coded to shorten and facilitate- 
tabulation. For instance, a family of (1) husband (2) wife, and 
(3) several children may be tabulated as 1, 2, 3, 4, 5, 6, etc. or 11, 
12, 13, 14, 15, etc. Dates such as Jan. 14, 1943, Dec. 24^ 1944 may 
be coded as 01, 14, 43 and 12, 24, 44, etc. Large numbers may be 
shortened by cutting off (.000) or (.000,000). Any coding that 
will change written statements or words to numbers, or shorten 
the processes of tabulation and computation should be employed. 
Numbers expressed as decimal fractions, such as .0122, .0097, 
.0107, etc., may be coded by multiplying by 10,000, or by moving 
the decimal point four places to the right, making the numbers, 
in this case, whole numbers, 122, 97, and 107. They can be^ 
further coded by subtracting a common number from each item as- 

122 - 100 - 22 
97 - 100 = - 3 
107 - 100 = 7 
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By the two methods of coding the original number, decimal frac- 
tions have been reduced to the small numbers, 22, -- 3, and 7 
which are easy to manipulate. This type of coding required 
decoding after the statistic or statistics desired are computed. In 
this case any average would have to be divided by 10,000 to 
counteract original multiplication by 10,000, and 100 would have 
to be added to counteract the subtraction of 100 earlier. 

Sometimes when the tabulations are to be made without ma- 
chines, the data are transferred from the original schedules to 
smaller cards. This has the double advantage (1) of saving the 
original schedules from being quickly worn out by too frequent 
handling and (2) of reducing the required figures and other in- 
formation to smaller space which materially facilitates their tabu- 
lation. In transferring the original data to the cards, they can 
usually be condensed or abbreviated. 

SUMMARY 

1. Secondary data are those already in existence and which have 
been collected for some other purpose than the answering of the question 
at hand. 

2. Primary data are those collected first hand by the statistician in 
order to answer the question at hand. 

3. The most effective use of libraries for statistical work is made bj* 
employing the fullest and most accurate indexes of statistical sources. 

4. The statistician should become thoroughly familiar with all prin- 
cipal sources and subject matter in his special field of work. Access to 
the stacks is an excellent means of doing this. 

5. All data should be checked for accuracy. This is especially true 
of secondary data. 

6. Secondary data must be tested for validity. Do these data apply 
to this problem? is the question. 

7. A statistical schedule is an organized logical sequence of questions 
for securing specific data for a given problem. 

8. A questionnaire is a related series of questions, usually prepared 
on a paper with intervening spaces for answers which is mailed to a 
number of persons from whom information is desired. 

9. Questionnaires should be sent only to those from whom answers 
may reasonably be expected. 

10. Coding data is reducing them to briefer and more simple form for 
purposes of tabulation. 
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REVIEW QUESTIONS 

1. What determines the kind of data sought for a statistical problem?' 
Why? 

2. Define secondary data. Indicate sources of such data not men- 
tioned in the text. 

3. What are the functions of libraries as related to secondary data 
and research? 

4. What are the best ways in which to use libraries for statistical 
research? 

5. Name the ten most important United States indexes and indicate 
the general nature of the data each covers. 

6. What Federal censuses are taken and when? 

7. Name and give a general idea of the contents of seven special 
U.S. statistical publications. 

8. Name and give a general idea of the contents of seven non-govern- 
ment indexes. 

9. What are some of the best sources of information in each of the 
following fields: 

1. Business 

2. Economics 

3. Education 

4. Psychology 

5. Sociology 

6. Theoretical Statistics 

10. What are the various public records which may be useful to the 
statistician? 

11. How is the accuracy of data checked? Explain in detail. 

12. What is testing the validity of data? Explain in detail. 

13. What are primary data? 

14. What is a schedule? Explain in detail. 

15. How should the questions be drawn up? Why? 

16. How should schedules be checked and tested before they are put 
into final form? 
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17. What instructions should accompany the set of schedules for the 
schedule taker and how should he be trained for his work? 

18. What is a questionnaire? How does it differ from a schedule? 

19. What are the relative advantages of the schedule as compared 
with the questionnaire? 

20. Explain the purpose and methods of coding data. 


EXERCISES 

1. Prepare a schedule and instruction sheet for the schedule takers 
to secure data to answer the question, “What percentage of the total 
automobiles in your community are Fords, Chevrolets, Plymouths, 
Nashes, or Pontiacs?” 

2. Prepare a schedule and instruction sheet for the schedule takers 
to secure data to answer the question, “What percentage of the boys 
in your community who smoke cigarettes learned to smoke in the 9th, 
10th, 11th, or 12th grades of high school?'^ 

3. Prepare a questionnaire to secure data to answer the question, “In 
what grade in school do boys and girls (1) learn to dance, (2) have their 
first dates, and (3) join their first social cliques or societies?’^ 

4. Prepare a questionnaire to secure data to answer the question, 
“What percentage of the children by sex, age, and distance from school 
would prefer to change from ^daylight-saving-time^ to regular time during 
part of the school year, and in what month would they prefer to make 
the change? 

5. Set up a plan for a statistical study to determine the average as- 
sessed value of single family residences in the poorest one-third, the middle 
one-third, and the richest one-third of your city. 
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RAW BATA 

Bata is the term used to designate the raw measurements or 
numbers which are to be analyzed. The data are a sample drawn 
from a larger body of measurements, facts, or numbers called a 
population or universe. A population or universe is an entire 
field of data, such as the heights of all school children, the lengths 
of all cotton lint, the wages of all carpenters, the prices of all 
common stocks, the yields of all acres of wheat, or the lengths of 
all oak leaves. A population is usually a large field of data. A 
sample is usually a relatively small portion of that large area 
chosen for statistical study. Populations, however, vary greatly 
in size. Some are without approachable limits, such as the num- 
ber of bacteria, oak leaves, insects, stars, amoeba, price quotations 
on potatoes, exchanges of goods and services of many kinds, snow- 
flakes, and grains of sand. Even the number of human beings on 
earth now exceeds 2,000,000,000 and is increasing. In fact, most 
populations with which we are concerned are so large that we can 
never view or measure all their items. On the other hand, there 
are some populations which are quite small. Egyptian vases of 
certain dynasties are few. Coins of ancient Ephesus and boats of 
the Vikings and objects of many other interesting topics of study 
are scarce. In some cases, although a sample was expanded to 
the utmost workable limit, it could include only a small portion 
of the universe. In other cases the sample might include most or 
even all of the available data. 
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DISCRETE AND CONTINUOUS SERIES 

In some types of data the unit is not divisible. It must be taken 
and counted in whole units or none. Such series of data are said 
to be discrete. One may have one-half pint of water, nine-tenths of 
a bushel of wheat, or three-fourths of a ton of coal, but half a boy, 
one-fourth of a soldier, or the tenth of a wife is an impossibility. 
Any series in which the units are indivisible entities is discrete. 
Each unit of data is separate and complete. A continuous series is 
one in which the units may be measured in fractions of any size 
no matter how small, so that there is a continuous flow of items 
of data with gradations infinitely minute. A ton of coal can be 
divided into hundredweights and these into pounds and the pounds 
into ounces and these into ten-millionths of an ounce and so on 
without theoretical limits. Such small units of data touch each 
other in a continuous flow of measurements. 

A fuller and more exact treatment of samples and sampling will 
be developed in Chapter 14. It is sufficient at this point to re- 
call that in statistics we are concerned primarily with the analysis 
of samples selected from larger groups of data. Ordinarily these 
samples when first selected consist of record sheets, or schedules, 
or questionnaires, which are not arranged in any systematic or 
regular order. They are simply collections of figures, such as 
the pile of sales slips in a store at the end of the day, or a pile of 
questionnaires as they were returned by mail over a period of 
days. Worksheet No. 1 is such a sample. 

THE STATISTICAL ANALYSIS OF SIMPLE DATA 

The beginning student is likely to think that statistical analyses 
are abstruse, difficult, highly mathematical procedures which are 
applicable only to mysterious data obtained in the recesses of a 
laboratory or experiment station. Quite the opposite is true. 
Statistics can dissolve experimental data of the most difficult 
types, but by far the widest and perhaps more important use of 
statistics is in finding correct answers to the large number of 
simple, commonplace, everyday problems of life. To illustrate this 
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simple, practical, -universal use of statistics, this text employs two 
of the simplest series of data in the world — the yield of wheat per 
acre, and the heights and weights of school children. The heights 
of persons is a matter of universal knowledge. The yields of 
field crops are familiar to everyone. These humble under-foot 
data are as appropriate for statistical analysis as the most rare 
and remote series. Often the analysis of simple things is more 
interesting, useful, and valuable from a practical standpoint than 
those that are more remote. The elementary student may well 
begin his study in statistics with series of data within his own 
narrow experiences. 

The data on the height, weight, and age of 106 school children 
were obtained by taking from the home room records of the ele- 
mentary schools of Stillwater, Oklahoma, the measurements for 
each tenth child on the school roll. Since the children were en- 
rolled alphabetically in each of the twenty-five rooms of the 
elementary city schools and there was no special selection or bias 
of any kind as to age, sex, economic status, or mental age, it is 
thought that the sample fairly represents the normal school grade 
distribution of these three measurements. Since Stillwater has a 
population dra-wn from every state in the nation, the sample would 
seem to be fairly representative of the general white elementary 
school population of small cities of the United States. 

WORKSHEET NO. 1 

In Worksheet No. 1, the data are mixed indiscriminately in 
the order in which they were taken from the school records. 

The Array 

The first step in the analysis of data is to arrange them in some 
systematic order according to some logical plan. The simplest 
and easiest method is to sort the data according to size, ranging 
from the smallest item to the largest. Such an arrangement is 
called an array. Worksheet No. 2 is such a group of arrays, first 
for height, then for weight, and last for age. In each case the 
smallest figure is placed first and the others following it next in 
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WORKSHEET NO. 1 


Height, Weight, and Age of 106 Stillwater, Oklahoma, School 
Children of Grades 1 to 6 Inclusive, 1939 


Height Weight 
Inches Lbs. 

Age 

Months 

Height Weight 
Inches Lbs. 

Age 

Months 

Height Weight 
Inches Lbs. 

Age 

Months 

53 

61 

no 

51 

65 

106 

49 

52 

107 

46 

40 

80 

48 

59 

88 

51 

65 

120 

51 

49 

108 

47 

52 

86 

48 

56 

89 

60 

98 

142 

54 

73 

122 

41 

41 

72 

53 

57 

103 

54 

88 

132 

60 

99 

151 

62 

92 

130 

48 

54 

96 

56 

83 

141 

54 

54 

106 

57 

64 

138 

50 

53 

101 

53 

61 

104 

43 

46 

75 

51 

63 

104 

49 

54 

104 

52 

62 

95 

53 

69 

107 

53 

82 

no 

47 

47 

87 

59 

83 

140 

49 

53 

91 

47 

57 

96 

51 

54 

102 

53 

55 

106 

45 

47 

75 

51 

61 

103 

51 

58 

96 

56 

69 

133 

56 

80 

140 

54 

61 

112 

66 

102 

173 

51 

55 

108 

47 

45 

81 

56 

90 

127 

48 

57 

78 

57 

75 

117 

47 

56 

82 

47 

50 

84 

50 

48 

92 

56 

108 

168 

50 

57 

92 

58 

84 

139 

52 

61 

no 

48 

52 

89 

53 

66 

109 

50 

54 

82 

58 

81 

118 

59 

74 

141 

55 

72 

116 

62 

89 

131 

44 

42 

75 

52 

59 

no 

47 

57 

108 

43 

38 

72 

50 

50 

97 

54 

63 

120 

56 

69 

127 

54 

70 

105 

54 

66 

101 

56 

, 73 

134 

57 

66 

118 

48 

49 

83 

59 

96 

144 

54 

67 

113 

49 

52 

90 

60 

94 

161 

53 

62 

134 

51 

65 

97 

54 

67 

121 

56 

71 

128 

51 

65 

108 

60 

92 

143 

57 

100 

131 

50 

48 

98 

53 

68 

108 

48 

48 

85 

53 

52 

99 

59 

86 

142 

59 

79 

131 

59 

98 

141 

48 

53 

107 

47 

53 

83 

55 

70 

107 

49 

54 

87 

44 

47 

78 

50 

55 

109 

54 

70 

116 

61 

100 

154 

52 

70 

102 

47 

47 

96 

54 

78 

130 

53 

57 

103 

48 

49 

84 

49 

50 

88 

58 

76 

129 

53 

62 

104 
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WORKSHEET NO. 2 


An Aeeat of Data on 106 Stillwater, Oklahoma, 
School Children 


Array No. 1 
Height Inches 

Array No. 2 
Weight Lbs. 

Array No. 3 

Age Months 

41 

50 

54 

38 

55 

69 

72 

99 

117 

43 

50 

54 

40 

55 

69 

72 

101 

118 

43 

50 

54 

41 

56 

70 

75 

101 

118 

44 

51 

54 

42 

56 

70 

75 

102 

120 

44 

51 

54 

45 

57 

70 

75 

102 

120 

45 

51 

55 

46 

57 

70 

78 

103 

121 

46 

51 

55 

47 

57 

71 

78 

103 

122 

47 

51 

56 

47 

57 

72 

80 

103 

127 

47 

51 

56 

47 

57 

73 

81 

104 

127 

47 

51 

56 

47 

57 

73 

82 

104 

128 

47 

51 

56 

48 

58 

74 

82 

104 

129 

47 

51 

56 

48 

59 

75 

83 

104 

130 

47 

51 

56 

48 

59 

76 

83 

105 

130 

47 

52 

56 

49 

61 

78 

84 

106 

131 

47 

52 

56 

49 

61 

79 

84 

106 

131 

47 

52 

57 

49 

61 

80 

85 

106 

131 

48 

52 

57 

50 

61 

81 

86 

107 

132 

48 

53 

57 

50 

61 

82 

87 

107 

133 

48 

53 

57 

50 

62 

83 

87 

107 

134 

48 

53 

58 

52 

62 

83 

88 

107 

134 

48 

53 

58 

52 

62 

84 

88 

108 

138 

48 

53 

58 

52 

63 

86 

89 

108 

139 

48 

53 

59 

52 

63 

88 

89 

108 

140 

48 

53 

59 

52 

64 

89 

90 

108 ” 

140 

48 

53 

59 

53 

65 

90 

91 

108 

141 

49 

53 

59 

53 

65 

92 

92 

109 

141 

49 

53 

59 

53 

65 

92 

92 

109 

141 

49 

53 

59 

53 

65 

94 

95 

110 

142 

49 

53 

60 

54 

66 

96 

96 

■ 110 

142 

49 

54 

60 

54 

66 

98 

96 

110 

143 

49 

54 

60 

54 

66 

98 

96 

no 

144 

50 

54 

60 

54 

67 

99 

96 

112 

151 

50 

54 

61 

54 

67 

100 

97 

113 

154 

50 

54 

62 

54 

68 

100 

97 

116 

161 

50 

54 

62 

55 

69 

102 

98 

116 

168 



66 



108 



173 
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size and so on with the largest number last. An array at once 
reveals three facts concerning the data: 

1. The range of the items from the smallest to the largest. The 
range in height is 41-“66 inches, inclusive, in weight, 38-108 pounds, 
inclusive, and in age, 72-173 months, inclusive. 

2. The item that occurs most frequently. There are 12 children 
'53 inches tall which is a greater number than that of any other 
height. Six children weigh 54 pounds and six weigh 57 pounds. 
No other weight group includes as many. Five children are 108 
months old. All other age groups are fewer in number. 

3. The middle item in each array, easily located by counting 
halfway through it, gives a rough idea of the average of the sam- 
ple. The middle item of the array of heights is 53 ; that of weight 
is 61 or 62; that of age is 107 months. From these three arrays 
of data in a few moments we have arrived at a considerable 
amount of information concerning these children, as follows: 



Range 

Mode 

Median 

Height in inches 

26 

53 

53 

Weight in pounds 

71 

54 or 57 

61.5 

Age in months 

102 

108 

107 


We may reasonably conclude that grade school children average 
53 inches tall and vary from about 13 inches shorter to 13 inches 
taller than the average; that their average weight is about 60 
pounds, ranging from about 23 pounds less than the average to 
nearly 40 pounds more; that they average about 9 years old, 
ranging from 6 years to 13.5 years. For a very simple problem 
these results may be sufficient, but in most cases a more complete 
analysis will be desired. 


Tally Sheets 

When we have discovered this information we have about ex- 
hausted the knowledge available from the arrays. More complex 
and detailed methods are necessary to reveal other still obscured 
relationships. The next logical step is the creation of a tally 
sheet. This may be done first by listing the number of items 
falling in each unit of height, weight, and age, as follows: 
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WORKSHEET NO. 3 


Frequency Distribution op the Height op 106 Stilwater, 
Oklahoma, School Children (in inches) 


Height 

No. 

Height 

No. 

Height 

No. 

41 

1 

50 

xm 11 

59 

xm 1 

42 

0 

51 

xm xm 

60 

nil 

43 

11 

52 

nil 

61 

1 

44 

11 

53 

xx^xm 11 

62 

11 

45 

1 

54 

xm xm 1 

63 

0 

46 

1 

55 

11 

64 

0 

47 

JJr» 1111 

56 

lirtt 111 

65 

0 

48 

nil 

57 

nil 

66 

1 

49 

jnnr i 

58 

111 




Total 106 


WORKSHEET NO. 4 

Frequency Distribution of the Weights of 100 School 
Children (in pounds) 


Weight 

No. 

Weight 

No. 

Weight 

No. 

Weight 

No. 

38 

1 

56 

11 

74 

1 

92 

11 

39 

0 

57 

mi 1 

75 

1 

93 

0 

40 

1 

58 

1 

76 

1 

94 

1 

41 

1 

59 

11 

77 

0 

95 

0 

42 

1 

60 

0 

78 

1 

96 

1 

43 

0 

61 

xm 

79 

1 

97 

0 

44 

0 

62 

111 

80 

1 

98 

11 

45 

1 

63 

11 

81 

1 

99 

1 

46 

1 

64 

1 

82 

1 

100 

11 

47 

nil 

65 

nil 

83 

11 

101 

0 

48 

111 

66 

111 

84 

1 

102 

1 

49 

111 

67 

11 

85 

0 

103 

0 

50 

111 

68 

1 

86 

1 

104 

0 

51 

0 

69 

111 

87 

0 

105 

0 

52 

xm 

70 

nil 

88 

1 

106 

0 

53 

nil 

71 

1 

89 

1 

107 

0 

54 

xm 1 

72 

1 

90 

1 

108 

1 

55 

111 

73 

11 

91 

0 




Total 106 
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Class Intervals 

Ordinarily the zero should not be written in the line of the unit 
or class in which no item falls. The space in the number column 
should be left blank for that unit or class. The zeros are written 
in the tally sheets above only for emphasis. The point to be em- 
phasized is that when so many classes are used, as in this case 
one class for each unit of the variable, there will likely be several 
classes vacant, unless the sample is very large. Better results 
can be obtained by making the classes wider, so that they will in- 
clude two or more units of data. Instead of having a separate 
class for each inch of height, a class might be two or more inches 
wide, as 40 to 42 inches, 42 to 44 inches, or even 40 to 50 inches. 
For weight, instead of having a class for each pound, it is possible 
to have a two or more pound limit for each class, 38 to 40 pounds, 
40 to 42 pounds or even 35 to 40 pounds, 40 to 45 pounds, or even 
35 to 45 pounds. 

In constructing class intervals for data, the question inevitably 
arises, How wide should classes be? The width of the classes is 
determined hy the number of classes. Since the smallest and the 
largest item of data must be included in the tally sheet, the num- 
ber of classes is a function of the range of the frequency distribu- 
tion and the number of classes. In the case of the height of school 
children, the range is 26 or from 41 to 66 inclusive. If we have 
only two classes, each one may be 13 inches wide, as 41 to 54, 
54 to 67. If we decide on three class intervals, the distribution 
may be as follows: 


WORKSHEET NO. 5 

Frequency Distribution of the Height of 106 School Children 


Class Intervals 
in Inches 

Frequencies 

f 

41 to 49 

imim im im im i 

31 

50 to 58 

im jah xm im imxmMii xxtt 



lui xm ua i 

61 

59 to 67 

xmxm nil 

14 


Total 

106 
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Rules for Selecting Class Intervals 
The number of classes may be fixed at two, or three, or five, or 
ten, or sixteen, or twenty, or any other number desired. One of 
the decisions which the statistician must make in working out a 
tally sheet for a frequency distribution of data is. How many class 
intervals shall there be? It is not desirable to fix a set number of 
classes for all distributions. Five guiding rules may be laid down 
which will roughly indicate the proper number of classes in any 
case. The first rule is, the fewer the number of classes and the wider 
they are, the greater will be the inaccuracies of the computations made 
from them. This may be illustrated by using the various classes 
above for the heights of school children. The correct arithmetic 
mean of this data is 52.434 inches, computed from the total of the 
individual items. The mean computed from the two 13-inches 
wide class intervals is 52.651 or .217 larger than the true mean, 
while that computed from the three somewhat narrow class in- 
tervals of Worksheet No. 5 is 52.315, or .119 smaller than the 
true mean. The second rule is, the amount of work necessary to 
compute a mean increases with the number of class intervals. It is 
wise, therefore, to strike a happy medium between too few classes, 
two, three, or four on the one hand, and too many classes, twenty 
or twenty-five on the other. In most cases eight to fifteen classes 
are sufficient to insure a high degree of accuracy and few enough 
to reduce the work to a reasonable minimum. In accordance 
with the above principles, the following classes mil be set up for 
the data on the school children. 

The third rule is, for ease of computation wherever it is possible, 
class intervals should be stated in full units or whole numbers instead 
of in fractions. In all cases above this rule has been followed. It 
is better, for instance, than class intervals such as these, 

40.0-42.7 
42.8-45.5 
45.6-48.3 etc. 

It is doubtful whether there are many cases in which anything is 
to be gained by splitting the classes into irregular fractions. It 
complicates the analysis and causes confusion. 
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WORKSHEET NO. 6 


Feeqttency Distributions of the Heights of 106 Stillwater, 
Oklahoma, School Children 


Distribution A 

Distribution B 

Class 



Class 



Interval 

Frequencies 

f 

Interval 

Frequencies 

f 

Inches 



Inches 



40-41 9 

1 

1 

40-43 9 

111 

3 

42-43.9 

11 

2 

44-47.9 

444+044+111 

13 

44-45.9 

111 

3 

48-51 9 

444+4411: 144+ 144+444+441+ 11 

32 

46-47.9 

0444 jia 

10 

52-55.9 

144+44+1441+441+4411:1111 

29 

48-49.9 

L4H 4«1 X4« 

15 

56-59 9 

441+ 44++ 44+t 44+t 1 

21 

50-51.9 

HrH: 4441 4411 11 

17 

60-63.9 

441+ 11 1 

7 

52-53 9 
54-55 9 

4411 4444 0441 1 
1441 44+1 111 

16 

13 

64-67.9 

1 1 

1 




56-57 9 
58-59 9 

1441 4441 11 
444+1111 

12 

9 


Total 

106 




60-61.9 

4414 

5 




62-63 9 
64-65.9 

11 

2 




66-67 9 

1 

1 





Total 106 


WORKSHEET NO. 7 

Tally Sheet and Frequency Distribution 
OF Heights of 106 School Children 


Class Interval Inches 

Frequencies 

f 

40.5-42.4 

1 

1 

42.5-44.4 

1111 

4 

44.5-46.4 

11 

2 

46.5-48.4 

1441 -IHrh 111 

18 

48.5-50.4 

HHr 4441 111 

13 

50.5-52.4 

1411 i4-H- nil 

14 

52.5-54.4 

Xm 1441 Xm im 111 

23 

54.5-56.4 

1444:1441 

10 

56.5-58.4 

Xm 11 

7 

58.5-60.4 

1441 1411* 

10 

60.5-62.4 

111 

3 

62.5-64.4 



64.5-66.4 

1 

1 


Total 106 ‘ 
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Worksheet No. 7 well illustrates the great advantage of setting 
up class intervals so that the concentration points of the data fall 
as near the mid-point of the class as possible. Such a selection of 
class limits will greatly increase the accuracy of the measures 
computed from the frequency distribution. The correct arith- 
metic mean of the above data is 52.43 inches. When computed 
from Worksheet No. 7, it is 52.44, but when computed from Work- 
sheet No. 6, A, it is 52.93. This is an error of one-half inch caused 
by locating the class intervals, 40-41.9, 42-43.9, instead of 40.5- 
42.4, 42.5-44.4. The latter class intervals place the mid-points 
closer to the concentration points of the data and give almost 
perfect accuracy in results, while the first group gives an error 
of one-half inch in the mean. This is an illustration of Rule Five 
on page 104 in setting up class intervals. 


WORKSHEET NO. 8 


Freqxjency Distribution of 
Weights of 106 Stillwater, 
Oklahoma, School Children 


Weight Lbs. 

f 

35- 39.9 

1 

40- 44.9 

3 

45- 49.9 

12 

50- 54.9 

18 

55- 59.9 

14 

60- 64.9 

11 

65- 69,9 

13 

70- 74.9 

9 

75- 79.9 

4 

80- 84.9 

6 

85- 89.9 

3 

90- 94.9 

4 

95- 99.9 

4 

100-104.9 

3 

105-109.9 

1 


N = 106 


WORKSHEET NO. 9 


FREQUEisiCY Distribution of 
THE Ages of 106 Stillwater, 
Oklahoma, School Children 


Age Months 

f 

70- 79.9 

7 

80- 89.9 

16 

90- 99.9 

13 

100-109.9 

26 

110-119.9 

11 

120-129.9 

8 

130-139.9 

11 

140-149.9 

9 

150-159.9 

2 

160-169.9 

2 

170-179.9 

1 


N = 106 


The fourth rule deserves special emphasis. All class intervals 
in any 'particular distribution should be of exactly identical width. 




104 


FREQUENCY DISTRIBUTIONS 


This rule was followed in all the above examples. The following 
examples are violations of this rule. 


40-41.9 

or 

40-41.9 

42-45.9 


42-43.9 

46-47.9 


44-45.9 

48-55.9 


46-55.9 

56-66.9 


56-and up 


Such inequality of class intervals invalidates the comparison of 
classes and destroys the value of short-cut methods. 

The fifth rule is, class intervals should he chosen so that any natural 
concentration of the data that may occur will tend to fall at the middle 
of the class. For instance, the price of items in a variety store 
tends to fall in units of .05, .10, .16, .20, .25, etc. In choosing 
class intervals for such data, the class limits should be chosen as 
follows. 

Class Intervals Mid-Points 

.025-.074 .05 

.075-.124 .10 

.125-.174 .15 

etc. etc. 

This fifth rule may seem to violate rule three to some extent, but 
if in any case it does, it should be given precedence. 

Selecting Mid-Points of Classes 

The next step in an analysis of data after the class intervals 
have been set up and the frequency distribution completed in the 
tally sheet, is the determination of the mid-point of each class. 
There has been some confusion in textbooks on statistics as to the 
location and the means of locating the mid-point of a class in- 
terval. The clearest, easiest, and most accurate method is to subtract 
the lower limit of the first class from the lower limit of the second 
class, divide the difference by 2 and add the half to the lower limit of 
the first class. This method clearly and accurately locates the mid- 
point of the first class. Since the classes are all the same width, 
the mid-points of the successive classes will be separated from the 
mid-point of the first class and from each other by the width of 
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the class intervals. This relationship may be expressed and com- 
puted from the following formula. 

Fokmula No. 1 

in which 

Ml = mid-point of first class 
Li == lower limit of first class 
L 2 = lower limit of second class 

or again from Worksheet No. 5 and again from Worksheet No. 6A 







= 1 + 40 

= 4.5 + 41 

= 1+ 40 

= 45.5 

= 41 


This accurate mathematical method is demonstrated clearly by 
means of the following graph in which we think of a number occu- 
pying a point on a scale. 

In the above graph the specific whole number 40 is located at the 
point indicated at the beginning of the two-unit class interval. 
Likewise, the specific whole 
number 41 is located ten deci- 
mal places to the right at the 
center of the two-unit class 
space. The space on the line 
between 40 and 41 is filled not by 40 but by 40 + various fractions 
of all conceivable sizes such as 40.01, 40.1, 40.2, 40.5, 40.9, etc. 
Likewise, the space between 41 and the end of the class, which 
goes entirely up to but does not include 42 is filled, not with 41, 
but with 41 + all of the conceivable fractional parts of a unit that 


A~r 

40 




41.0 

Fig. 1 


7 \ 

42 
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exist between 41 and 42. It is clear from this graph that 41 is the 
middle of the class. 


40.5 


4a. 4a. 

p 



tttI- 

41.0 


A ’ T r-| T7\ ' "r 

43.5 44.0 


4a. u; 

‘ ' A ‘ ‘ ^ T ' A ' r " ~T ■ • i ' ~A ~ T ' 


hTTT 

41.5 


42.5 


“TWi 

43.0 


41. 

I 

43.4 


42.0 


44.5 


45.5 


-T— TTvr-rn— TA 
46.0 46.4 


45.0 

Fig. 2 


Mi = 



43.5 - 40.5 
2 


40.5 


= 1 + 40.5 

= 1.5 + 40.5 
= 42.0 


The mid-point of any class may be quickly and accurately lo- 
cated by means of either the formula or the graph. In no case 
should we consider a number as occupying a space on the scale, 
but only a point. The space, that is, the indefinitely large number 
of points, making up the line between the two points occupied by 
the two terminal numbers, is filled only with the indefinitely 
large number of fractions which may exist between the two 
whole numbers. 


Graphs of Frequency Distributions 
When one has completed a tally sheet for a group of data and 
obtained a frequency distribution, he should make a picture of it 
in the form of a graph. This graph may be either a histogram or a 
frequency polygon. For comparative purposes, the frequency dis- 
tributions of the previous worksheets will be presented in his- 
tograms. 

A comparison of Figs. 3-6 makes quite evident the advantages 
and disadvantages of various numbers and widths of class in- 
tervals for a frequency distribution. If there are too few classes, 
as in Figs. 3 and 4, the histogram will present a rough, square- 
like angular form. Such a distribution is far removed from the 
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Fig. 5. Histogram of frequency distribution of heights 
of 106 grade-school children as shown in Worksheet 
No. 6, A 

ideal smooth normal frequency curve shown in Fig. 64 (p. 332). 
On the other hand, Fig. 6 with twenty-six class intervals presents 
a rough, irregular or '^snaggle tooth” view of the data, because 
it has too many classes for a sample of only 106 items. If the 
sample contained 500 or perhaps 1,000 items, many if not most of 
the gaps would be filled. But even in that case, half the number 
of classes would yield an equal degree of accuracy and would 
reduce the work of computation considerably. The distribution 
in either Fig. 4 or Fig. 5 is superior. An intermediate range of 
nine classes, each three inches wide, might be still better. This 
comparison, however, is sufficient to guide the student into proper 
methods of class selection. 
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41 43 45 47 49 51 53 55 57 59 61 63 65 67 

Fig. 6. Histogram of frequency distribution of heights of 106 
grade-school children as shown in Worksheet No. 3 


A second and somewhat easier method of obtaining a picture of 
a frequency distribution is the frequency polygon. A polygon is 
a plain figure bounded by straight lines and is drawn as follows: 

Laying down the class interval scale on the X-axis, the fre- 
quencies are measured on the F-axis. The frequency of each 
class as measured on the F-axis scale is plotted as a dot or point 
in the middle of the class interval space. These plotted points 
are then joined by straight lines, making a polygon. If the sample 
is quite large, 5,000 to 10,000 items, the polygon will become more 
smooth as the number of class intervals is increased, until it ap- 
proaches a smooth curve. A large number of class intervals for a 
small sample will result in an irregular or saw-tooth poly- 
gon. 
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The {Jay) J -Curve or Distribution 
It must not be thought that all frequency distributions fall into 
the form of the so-called normal curve which shows a heavy con- 
centration of items near the center of the group. A type that is 
often found is the J-curve,” so-called because it roughly represents 
a half U, or J. Figure 9 shows such a distribution. 

WORKSHEET NO. 10 


Class Intervals 
(in cents) 

Frequencies 

5- 14.9 ' 

512 

15™ 24.9 

297 

25- 34.9 

176 

35- 44.9 

108 

45- 54.9 

76 

55- 64.9 

45 

65- 74.9 

32 

75- 84.9 

19 

85- 94.9 

12 

95-104.9 

7 

Total 

1,284 


WORKSHEET NO. 11 

Frequency Distribution of 
100 Rifle Shots at 50 Feet 


Class Intervals 
in Millimeters 

Frequencies 

0— 3.9 

38 

4- 7.9 

23 

8-11.9 

14 

12-15.9 

10 

16-19.9 

7 

20-23.9 

4 

24-27.9 

3 

28-31.9 

1 

Total 

100 




Number of 
Articles 



Fig. 9. Frequency polygon of fre- 
quency distribution of number of 
articles purchased at a grocery store 
by an average family in a year 


45 55 65 75 
Value in Cents 


95 105 115 125 
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Such frequency distributions occur in populations which ap- 
proach a maximum at zero, or at minimum deviations from zero. 
The frequency distribution of the number of individual incomes 
per class interval of incomes beginning at zero has the same gen- 
eral shape. 


SUMMARY 

1. Data are raw measurements or numbers which are to be analyzed 
by statistical methods. 

2. A statistical population or universe is a complete field of data. 

3. Data which are measured in indivisible units, such as wife, child, 
or soldier, are said to be discrete. 

4. Data which may be divided into infinitesimally small fractions, 
such as tons of coal, inches of wire, or acres of land, are said to be con- 
tinuous. 

5. An array is an arrangement of data in a schedule according to size, 
usually from the smallest item to the largest. 

6. A class interval is a span of units of data, such as 4 to 7 inclusive, 
or 4-7.9, and is usually of equal or uniform width for the entire range. 

7. A tally sheet is a device for arranging the raw data in a systematic 
order according to successive class intervals. The number of items in 
each class interval is the frequency of that class. 

8. The class limits should be arranged as nearly as possible so as to 
place the mid-points of the classes as near as possible to the most fre- 
quent value of data for each class. For instance, in 10-cent-store sales 
the mid-points of the classes should preferably be 5, 10, 15, etc. 

9. The mid-point of a class is easily located by subtracting the lower 
limit of the class from the lower limit of the succeeding class, dividing 
this difference by 2 and adding the quotient to the lower limit of the first 
class. 

10. A frequency polygon is a graph of a frequency distribution in which 
the class intervals are measured on the x-axis and the class frequencies 
are measured on the ^/-axis. 

11. The purpose of throwing data into frequency distributions is to 
economize time and effort in their analysis. 

12. The use of class intervals introduces a slight error into the com- 
parisons. Since this error tends to increase as the number of classes is 
reduced it is ordinarily advisable to have not less than eight or ten classes. 

13. Since the amount of work increases as the number of classes in- 
creases it is ordinarily not desirable to have more than twenty or twenty- 
five classes. From twelve to fifteen is ordinarily a desirable number if 
the data lend themselves conveniently to such a classification. 
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REVIEW QUESTIONS 

1. What is a population or universe? Explain and name five examplers, 

2. What is a sample? 

3. How is an array constructed? What is its use? What informa- 
tion will it yield? 

4. Explain the use of a tally sheet. 

5. What are the advantages and disadvantages of class intervals? 

6. How wide should a class interval be? Explain fully. 

7. How many class intervals should there be for a frequency distribu- 
tion? Explain the advantages of a larger or smaller number. 

8. Why should class intervals be of equal width? 

9. When should the limits of class intervals be stated in fractions? 
Explain. 

10. Locate the mid-points of the following class intervals: 10-13.9, 
25-34.9, 12.5-17.4 and explain in detail how it is done. 

11. How is a histogram constructed? Give example. 

12. How is a frequency polygon drawn? Give example. 

13. How does the curve differ from the normal” frequency dis- 
tribution? Name three fields of data in which it may be found. 
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EXERCISES 1 

1. Yield in bushels per acre of corn in Missouri, by counties, 1925. 
E. A. Logan, Statistician, U. S. D. A. 

35, 34, 36, 32, 34, 39, 34, 32, 35, 33, 39, 40, 40, 37, 28, 31, 31, 30, 34, 32, 

33, 31, 35, 35, 30, 35, 34, 29, 35. 35, 31, 35, 30, 35, 30, 34, 31, 27, 30, 16, 

30, 37, 29, 40, 24, 18, 23, 32, 32, 29, 34, 34, 25, 30, 30, 28, 24, 28, 32, 26, 

31, 30, 24, 25, 25, 33, 23, 34, 35, 30, 32, 33, 29, 39, 27, 32, 33, 31, 29, 21, 

10, 21, 15, 25, 14, 17, 12, 16, 19, 24, 23, 27, 23, 21, 17, 24, 23; 22, 26, 19, 

23, 20, 21, 28, 26, 29, 26, 29, 27, 29, 28, 28, 38, 27. 

2. Total assessments of railroad companies, by counties, in Oklahoma, 
1937, in $100,000. Third Biennial Report, Oklahoma Tax Commission. 
$11, 20, 20, 5, 10, 14, 24, 16, 21, 14, 5, 11, 14, 8, 7, 14, 3, 19, 18, 17, 3, 
5, 12, 33, 15, 25, 17, 5, 2, 4, 3, 20, 14, 10, 12, 36, 9, 18, 12, 38, 29, 22, 9, 
12, 8, 10, 11, 10, 15, 12, 29, 18, 7, 21, 10, 32, 20, 17, 10, 24, 19, 19, 10, 
3, 45, 17, 41, 18, 23, 23, 32, 12, 29, 12, 4, 21, 8. 

3. Gross production of petroleum in 100,000 barrels for 80 oil fields 
in Oklahoma, Kansas, and Texas, 1930. Petroleum Facts and Figures, 
A. P. L, 1931. 

65, 6, 15, 11, 67, 12, 55, 57, 32, 26, 50, 4, 30, 100, 107, 23, 4, 16, 12, 43, 
25, 5, 60, 121, 54, 24, 17, 36, 38, 24, 340, 93, 10, 2, 115, 29, 48, 31, 67, 

11, 2, 25, 14, 28, 10, 14, 50, 3, 70, 10, 10, 4, 12, 78, 40, 127, 3, 74, 2, 2, 
59, 29, 40, 17, 37, 79, 7, 24, 12, 14, 18, 2, 25, 20, 3, 53, 5, 31, 72, 16. 

4. Grades of 107 college students from a comprehensive semester ex- 
amination in History 223, May, 1940, Oklahoma A. and M. College. 
(150 points were possible.) 

77, 100, 79, 102, 61, 86, 102, 72, 107, 96, 77, 117, 74, 86, 112, 96, 90, 68, 
91, 76, 95, 63, 93, 75, 88, 90, 83, 49, 120, 105, 81, 70, 137, 64, 92, 75, 77, 
83, 104, 78, 120, 122, 70, 96, 111, 87, 81, 96, 98, 123, 54, 98, 127, 117, 99, 
86, 103, 98, 115, 90, 93, 136, 142, 96, 67, 104, 94, 78, 99, 102, 110, 96, 109, 
97, 52, 128, 86, 111, 82, 93, 94, 115, 67, 115, 140, 84, 83, 125, 102, 104, 
65, 89, 79, 85, 80, 95, 120, 60, 59, 75, 125, 95, 134, 95, 104, 80, 118 

6. Fifty rifle shots in millimeters from center of bull’s eye by R. 0. T. C. 
students at Oklahoma A. and M. College, February 12, 1942. 

16, 0. 8, 1, 20, 18, 0, 0, 1, 3, 23, 5, 14, 1, 2, 0, 4, 10, 13, 2, 3, 0, 5, 6, 0, 1, 
14, 3, 0, 11, 7, 2, 9, 2, 0, 6, 2, 0, 9, 12, 7, 4, 1, 1, 0, 1, 3, 5, 4, 11. 

* See “ Teacher’.? Use of Exercises and Problems,” and Footnote for 
Exercises at the end of Chapter 3. 




CHAPTER 7 


TABULATION 


Tabulation in its broadest sense is any orderly arrangement of 
data in columns and rows. This definition includes worksheets. 
A worksheet, however, should be distinguished frona the ordinary 
table, because in a worksheet the data are always arranged ac- 
cording to some narrow and exact mathematical requirement and 
include the mathematical computations of the specific data for a 
specific purpose, such as a mean, a regression line, or correlation. 

Any meaningful tabulation must be based on some logical classi- 
fication of the data. It is hardly conceivable that any intelligent 
person would jumble data in a table of meaningless confusion. 
Some basis of logical arrangement may be taken for granted in 
any table. Sometimes the principle of which classification is 
made is simple and is revealed to the reader by a glance at the 
table. At other times the principle of organization is complex 
and obscured by a mass of details. In any case, some systematic 
principle of analysis should be worked out for each table before 
tabulation is begun. 

There are four basic principles for tabulation: 

1. Time sequence; 

2. Spatial sequence, or location; 

3. Quality, or order of characteristics; 

4. Quantity, or size. 

1. Time sequence is a universal human experience. We are 
continually conscious of the lapse of time and the sequence of 
events in time. The operations of a business, a social event, or a 
journey may be classified according to the time order in which 
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they occur. All time series may be so organized. This is a fruit- 
ful basis for classification if there is any logical relation between 
the order of past events and those of the present or expected fu- 
ture. The business cycle, seasonal, or month-to-month variation 
in sales or production, changing inventory in a store, or the opera- 
tions of agriculture are illustrations of this principle with which 
the student is familiar. It has a wide application. 

2. Spatial sequence, or geographic location, is as universal as 
time. Data may be collected without reference to time, or dura- 
tion may be eliminated by taking data from many locations at 
one instant or point of time. The population data of any single 
census are for one point of time. They measure the number of 
persons by areas at the same point of time. In fact, all data of 
all kinds in any single census are for a point in time. They cannot 
be classified on a time sequence basis, but only on the bases of 
space, quality, or quantity. A succession of censuses is necessary 
to form a time series. Geographic subdivisions are a common 
and very useful basis for classification. Sales, production, wages, 
interest rates, deaths, births, rainfall, and any other data that may 
be associated with space may be classified by location. 

3. Quality and quality variations are universal facts in nature 
and in our thinking processes. Adverbs and adjectives and their 
comparisons make up a large part of all languages. We are con- 
tinually comparing good, better, and best; slow, slower, and 
slowest; sweet, sweeter, and sweetest. We could not buy, sell, or 
conduct business, or carry on social intercourse, or establish or 
operate political organizations without quality distinctions. This 
is one of the widest and most useful bases for tabulation. Careful 
and accurate discrimination among qualities, physical, mental, 
and social, is essential to its proper use. 

4. Distinctions between quantities are among the simplest and 
most necessary of our experiences. Much, more, and most; 
little, less, and least are quantitative comparisons necessary in 
sales, purchases, production, transportation, distribution, and 
consumption. Data which can be stated in numerical form can 
be tabulated according to quantity variation. Man has devel- 
oped a large number of units for measuring quantity such as the 
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gram, ounce, pound, ton, inch, foot, yard, rod, mile, B.T.U., 
pint, quart, barrel, meter, dollar, franc, second, hour, and light- 
year. On the basis of such units, most data can be classified 
and tabulated. Bills of lading, inventory sheets, pay rolls, popu- 
lation figures, number of births, imports, school enrollment, etc , 
are quantity tables. 

TABLES 

Various authors have classified and defined tables as general 
and specific, primary and secondary, summary and detailed. 
These definitions are somewhat useful but in the main are too 
vague and inaccurate to be of much value in guiding students in 
making tables. The best classifications rest on the form of the 
tables. The following types of tables may be clearly differentiated. 

1. Complete cross-classification tables 

2. Incomplete cross-classification tables 

3. Percentage comparison tables 

4. Frequency tables 

5. Cumulative tables 

Essential ParU of a Table 

Before the various types of tables are analyzed in detail, it is 
necessary to have a clear idea of the essential parts of any table. 

Body of Table. The most important part of any table is the 
body of the table. This part is composed of the columns and 
rows or crossed vertical and horizontal lines and spaces which 
include the data. 

Columns 


Rows 


The organization of this part of a table is of prime importance. 
It is the table. Its form is determined by the (1) bases of classi- 
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fication, (2) the number of classes and subclasses desired, (3) the 
relationships which it is desired to show among classes. Before 
the body of a table can be formulated, it is necessary to study the 
data in detail and the purposes for which they are to be analyzed. 
If all characteristics of the data are to be revealed, there must be 
a class or subclass for each characteristic. If only some of the 
relationships are to be emphasized, the number of classes may be 
reduced. In any case, the statistician must decide what he wishes 
to reveal about the data before he can set up the form of the table. 
If all inter-relationships are to be shown, a complete cross-classi- 
fication table is necessary. If only some of the relationships are 
to be emphasized, an incomplete cross-classification table is suf- 
ficient. The beginning student may have to experiment with 
several forms before he secures the most appropriate one. A 
clear and adequate table is the result of much careful thought. 
No table is better than the planning that goes into its formation. 

Captions and Stubs. A caption is a heading of a column or 
subcolumn. A stub is the heading or description of a row, or 


(Captions) 
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cross-section of a table. If the table is to be complete, it will have 
one or more general class captions and several subclass captions. 
It will likewise have several stubs. The preceding table pattern 
is illustrative of these relationships. 

Table Totals. There are two sets of locations for table totals. 
If it is desirable to emphasize the totals, they should be placed at 
the top and at the left. This is their location in all tables in the 
various United States censuses and in many business reports. 
Since the totals are the largest and in many respects the most im- 
portant items in the table, they should be placed in the positions 
of emphasis. If, however, there is no purpose for emphasizing 
the totals, or one wishes to follow the traditional routine summa- 
tion of data, the totals may be placed at the bottom of the table 
and at the right. Accounting forms usually place totals at the 
bottom as in the balance sheet and profit and loss statement. 
Statistical analyses usually place the totals at the top of the table 
and follow the total with its breakdown into subclasses. 

Locations of Emphasis in Tables. Since the purpose of a 
statistical table is to make the meanings of data stand out in bold 
relief, it is necessary to exploit the locations of strongest contrast for 
comparisons and contrasts. 

1. The strongest position for comparison or contrast is the 
vertical consecutive location, or the placing of one item immediately 
above another in the same columns, as 


98765 

01234 


In this position the entire comparison is taken in at once without 
moving the eye. 

2. The second strongest position for comparison or contrast is 
the horizontal consecutive location, or the placing of one item in 
adjoining column in the same row, as 


98765 


01234 
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In this position the entire comparison or contrast is taken in with 
one slight movement of the eye. 

3. The third strongest position of comparison or contrast is 
the vertical alternate location, or the placing of the items one above 
the other in the same column with one intervening item, as 


98765 

57975 

01234 


In this position the eye must pass over one intervening item to 
make the comparison, which delay slightly weakens the contrast. 

4. The fourth strongest position of comparison or contrast is 
the horizontal alternate location, or the placing of both items in 
the same row with one intervening item, as 


98765 


57975 


01234 


In this position the eye must move farther to grasp the signifi- 
cance of the items to be compared, which movement requires a 
longer lapse of time and more effort. 

5. Other positions of still weaker emphasis may be employed 
by placing the items still farther apart in the columns or in the 
rows. From these statements the basic principle of location for 
emphasis is derived, that that location in a table is strongest for 
comparison or contrast which forces the idea into the consciousness 
of the reader with the least expenditure of time and energy on his 
part. It makes the idea stick out like a sore thumb. 

IDENTIFICATION OF TABLES 

Every table is a distinct creation for some specific purpose. 
As an individual structure it must be named, numbered, and its 
source of data given that it may be easily and quickly identified 
and referred to in discussion. 
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1. Name or Title. Every table must have a title which should 
answer the three questions of What? When? and Where? with 
the least expenditure of time and effort on the part of the reader. 
It is usually effective to place the title of a table in the form of a 
three-line inverted triangle, as 


which answers in order the three questions of What? Where? and 
When? Examples of this type of title are 

Height of Grade School Children 
Stillwater, Oklahoma 
1939 

and 

Production of Open Hearth Steel 
United States 
1941 
or 

Merchant Ships Sunk by Submarines 
in Atlantic Ocean 
May, 1942 

2. Number. Although a good title reveals at once the content 
and meaning of a table, it is a poor means of reference and identi- 
fication because of its length. In referring to a table in discussion, 
it is a waste of time to have to repeat its title at every reference. 
As a matter of economy for reference, all tables should be num- 
bered. The table numbers for a chapter, and preferably for the 
entire book, should be in consecutive order to facilitate locatiop 
in the body of the text. If they are numbered by chapters, the 
numerals should give the chapter number first followed by a 
period or colon followed by the table number, as 4.3, or 4:3, or 
17.2, 17:2, etc. It is usually better to number all the tables in a 
book in consecutive order, as Table 1, Table 2, . . ., Table 175, 
etc., or Exhibit 1, Exhibit 2 . . . Exhibit 207, etc. These numbers 
may be placed above the title at the top of the table or at the 
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bottom of the table with the source reference. They should be 
written in a type that is easy to locate and identify. 

3. Source Reference. The source of the data used in each 
table should be given in a reference at the bottom of the table, as 

* Survey of Current Business, United States Department 
of Commerce, May, 1940. 

or * Monthly Labor Review, United States Department of 
Labor, October, 1941, p. 763. 

or * Dun and Bradstreet, Inc., Dun^s Review (June, 1941), 
p. 63. 

or (1) The Babsonchart of Business Conditions, Babson^s Sta- 
tistical Organization, June, 1942. 


COMPLETE CROSS-CLASSIFICATION TABLES 

In a complete cross-classification table every characteristic of 
the data included in the table can be read from a single point of 
data. The term “complete’’ does not mean that every fact and 
point in the original data is included in the table, but that the 
crosS‘classification is complete. For instance, if four classifications 
are included in the table, the value of every item in the table is 
revealed in all four classes and can be read from one point in the 
table. 


7. One Classification 
TABLE 2 

Heights of Grade School Children 


Stillwater, 

, Oklahoma 1939* 

Heights 

Number of Children 

All Heights 

106 

40-49 

31 

50-59 

67 

60-69 

8 


* Source: Worksheet No. 2, Chapter 6. 
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The one characteristic revealed in this table, that of height, is 
completely classified, so that the reader can tell at one glance the 
variations in the number of children in all included classes of 
heights. 

II. Two Classifications 
TABLE 3 

Heights and Weights of Grade School Children 
Stillwater, Oklahoma, 1939 * 


Weights 

Heights 

All Weights 

All Heights 

40-49 

50-59 

60-69 

106 

31 

67 

8 

30- 49 

16 

13 

3 i 


50- 69 

56 

18 

38 


70- 89 

22 


21 

1 

90-109 

12 

1 

5 

7 


Source: Worksheet No. 1, Chapter 6. 


Table 3 was made by tallying the height and weight of each 
child given in Worksheet No. 1 (p. 96) in its appropriate cell in 
the table, and then accumulating the totals under the appropriate 
captions and opposite the appropriate stubs. The figures for 
each cell are summed for each column and each row in totals at 
the top and at the left. The student should note how complete 
this cross-classified information is. This test may be made by 
placing a pencil point on any figure in the table and asking: 
^^What are the heights and what are the weights of these chil- 
dren?^’ We may illustrate the point by taking the figure 13, the 
figure in the first column and the first row. What are the heights 
of these 13 children? The answer comes at once. ^^They are 
from 40 to 49 inches tall.” How much do they weigh? The 
answer is complete, “From 30 to 49 pounds.” Try the 5 in the 
middle column and bottom row. “They are from 50 to 59 inches 
tall and weigh from 90 to 109 pounds.” This is a complete cross- 
classification table, revealing at once all information the table is 
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designed to give about every item in it. The classes may be made 
as numerous and narrow as desired. It is a powerful device for 
analysis. > 

7/7. Three Classifications 

TABLE 4 


Heights, Weights, and Ages of Grade School 
Children, Stillwater, Oklahoma, 1939 * 


Weights 

Heights 

All Weights 

All Heights 

40-49 

50-59 

60-69 

106 

31 

67 

8 

72-107 months old 

55 

30 

25 


108-143 months old 

45 

1 

40 

4 

144-179 months old 

6 


2 

4 

30- 69 pounds 

72 

31 

41 


72-107 months old 

52 

30 

22 


108-143 months old 
144-179 months old 

20 

1 

19 


70-109 pounds 

34 


26 

8 

72-107 months old 

3 


3 


108-143 months old 

25 


21 

4 

144-179 months old 

6 


2 

4 


Source: Worksheet No. 1, Chapter 6. 


THE CONSTRUCTION OF A THREE-CLASS TABLE 

Making a three-way table is not as easy a task as it appears at 
first glance. Seventy-five percent of all students fail the first time 
they attempt it. Since there are only two sides or dimensions to a 
table, it is necessary to place t’wo classes on one side when three 
classes are used. Two of the classes may be placed on the top and 
one on the side, or one on top and two on the side. The principle 
to follow in deciding which of these two methods to use is, Place 
the largest number of subclasses on the side. The reason for this is 
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that there can be more rows than columns on a normal page. 
In Table 4 there are 12 rows but only 5 columns. 

The second point to consider in making a three-way table is. 
When two classes are placed on one side, one of these two classes 
must be put on the outside and the other class put under it ana 
repeated over and over again as many times as there are subclassey 
and totals in the outside class. This is where students stumble, 
They would do well to '^stop, look, and listen at this point. In 
Table 4, two classes, weight and age are placed on the side, and 
height is put at the top. On the side where the two classes are, 
weight is put on the outside and the three subclasses of age in 
complete form are repeated over and over under each subclass of 
weight. By no other process can a complete cross-classification 
table be made. 

Let us give Table 4 the pencil point check to test whether it is a 
complete cross-classification table. Place the point of your pencil 
on 4, the lowest right-hand figure in the table. Can all three 
characteristics of these four items be read from this one point? 
The answer is. Yes. If we read up the column we find their 
heights are between 60 to 69 inches. If we read across on the row, 
we find they are from 144 to 179 months old and weigh from 70 
to 109 pounds. All the information about them in the table can 
be read from this one point. Try any other group in the table 
and you will get the same results. This principle is basic. It is 
the final test of the completeness of the cross-classification in any 
table and applies to all tables regardless of the number of classes 
and subclasses in them. 

A four-way cross-classification table is a series of sections, or 
subcells, each one of which is a table in itself. All the sub-totals 
of all these sections are accumulated in the grand totals sections 
at the top and at the left. A good practice for the student who 
is learning to make four-way tables is to give several of the items 
in this table the pencil point check. All four characteristics of 
each item in this table can be read from one point. 

A five-vray table could be constructed from these data by adding 
to the four classes included in Table 5 the fifth class of schools, as 
Eugene Field, Jefferson, and Lincoln. Such a five-way table 
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would be too large to include on the single page of a text. If it 
were set up, it would be preferable to put three classes on the side 
and leave the two classes of height and sex as they are in Table 5. 
If a third class of schools were put on the side, it would be put on 
the outside of the two classes of weight and age and the 12 rows of 

IV. Four Classifications 
TABLE 5 


Heights, Weights, Ages, and Sex of Grade School Children 
Stillwater, Oklahoma, 1939 * 


Weights and Ages ' 





Sex 

OF Children 





Total 

Male 

Female 


All 

40- 

50- 

60- 

All 

40- 50- 

60- 

All 

40- 50- 

60- 


Heights 

49 

59 

69 

Heights 

49 59 

69 

Heights 

49 59 

69 

All Weights 

106 

31 

67 

8 

55 

16 35 

4 

51 

15 32 

4 

72-107 months old 

55 

30 

25 


29 

15 14 


26 

15 11 


108-143 months old 

45 

1 

40 

4 

22 

1 20 

1 

23 

. 20 

3 

144-179 months old 

6 


2 

4 

4 

1 

3 

2 

1 

1 

30-69 pounds 

72 

31 

41 


38 

16 22 


34 

15 19 


72-107 months old 

52 

30 

22 


27 

15 12 


25 

15 10 


108-143 months old 

20 

1 

19 


11 

1 10 


9 

9 


144-179 months old 











70-109 pounds 

34 


26 

8 

17 

13 

4 

17 

13 

4 

72-107 months old 

3 


3 


2 

2 


1 

1 


108-143 months old 

25 


21 

4 

11 

10 

1 

14 

11 

3 

144-179 months old 

6 


2 

4 

4 

1 

3 

2 

1 

1 


* Source: Height, Weight, Age, and Sex Charts of Stillwater Grade Schools, 1930. 


the present table with their present stubs would be repeated over 
and over under (1) All Schools, (2) Eugene Field, (3) Jefferson, 
and (4) Lincoln, four times in all. The five-way table would 
have the same width as Table 5, but would be four times as long. 
This rapid expansion in size of complete cross-classifications with 
the addition of each new variable requires a large sheet of paper or 
the breaking up of the total table and the presentation of each 
separate section or cell on a separate page, if more than four 
main classifications are used. In the United States Decennial 
Census, some of the larger tables showing data by states, counties, 
age, sex, marital condition, etc., cover many pages. 
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The complete cross-classification table is a powerful tool for 
the analysis of data. A much more complete discussion of the 
analytical possibilities of such tables will be given in Chapter 13. 
Table 3 clearly reveals the relation between the heights and 
weights of grade school children. Table 4 shows this relationship 
modified by the age of children. 

INCOMPLETE CROSS-CLASSIFICATION TABLES 

Frequently an incomplete cross-classification table is sufficient 
for the analysis desired, and since it is easier to construct, it can 
be used to advantage in such cases. 

I. Two Classifications, Incomplete 


TABLE 6 

Heights and Weights of Grade School Children 
Stillwater, Oklahoma, 1939 * 


Heights 

Weights 

Total 

40-49 

50-59 

60-69 

Total 

30-49 

50-69 

70-89 

90-109 

106 

31 

67 

8 1 

108 

16 

56 

22 

12 


* Source: Worksheet No 2, Chapter 6. 

If we apply the pencil point check to Table 6, we at once dis- 
cover that its cross-classification is incomplete. Let us take the 
12 in the last column under Weights. How much do these 12 
children weigh? The answer is, ‘^90 to 109 pounds.” How tall 
are they? There is no answer. The table does not give the in- 
formation. It is true that there are 8 children from 60 to 69 inches 
tall, but there is no way of knowing whether any of the 12 chil- 
dren are included in the 8. The pencil point check proves that 
the cross-classification of this table is incomplete. If there is no 
present need for such a cross-classification. Table 6 is sufficient. 
If all we wish to know is the range and frequency of the heights 
and the range and frequency of the weights without revealing any 
cross-relations, the above table is the kind to construct, because 
it is easier, quicker, and cheaper to make than a complete cross- 
classification table for the same data. 
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IL Three Classifications, Incomplete 


TABLE 7 

Heights, Weights, and Ages of Geade School Childeen 
Stillwatee, Oklahoma, 1939 * 


Heights 

Weights 

Ages 

Totals 

40-49 

50-59 

60-69 

Totals 

30-49 

50-69 

70-89 

90-109 

Totals 

72-107 

108-143 

144-179 

106 

31 

67 

8 

106 

16 j 

56 

22 ^ 

12 

106 

55 

45 

6 


* Source: Worksheet No 2, Chapter 2. 


The data in Table 7 may also be presented in vertical form, as 
in Table 8. 


III. Three Classifications, Incomplete 
TABLE 8 

Heights, Weights, and Ages of Geade School 
Childeen, Stillwatee, Oklahoma, 1939 * 


Heights 


Total 

106 

4(M9 

31 

50-59 I 

67 

60-69 

8 

Weights 


Total 

106 

30- 49 

16 

50- 69 

56 

70- 89 

22 

90-109 

12 

Ages 

106 

72-107 

55 

108-143 

45 

144-179 

6 


Source: Worksheet No. 2, Chapter 6. 
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If cross-classificatioii is not desired, there is no advantage in 
including more than one class in one table, except for conveniences 
of space and reference. The three classes in Table 8 might just 
as well have been placed in three separate tables. 

IV. Four Classifications, Partly Complete 
TABLE 9 


Heights, Weights, Ages, and Sex of Grade School 
Children, Stillwater, Oklahoma, 1939 * 



Sex 


Total 

Male 

Female 

Heights, Total 

106 

55 

51 

40-49 

31 

16 

15 

50-59 

67 

35 

32 

60-69 

8 

4 

4 

Weights, Total 

106 

55 

51 

30- 49 

16 

9 

7 

50- 69 

56 

29 

27 

70- 89 

22 

10 

12 

90-109 

12 i 

7 

5 

Ages, Total 

106 

55 

51 

72-107 

55 

29 

26 

108-143 

45 

22 

23 

144-179 

6 

4 

2 


* Source: Height, Weight, Age, and Sex Charts of 
Stillwater Grade Schools, 1939. 


In Table 9 there is cross-classification between sex and height, 
between sex and weight, and between sex and age, but there is no 
cross-classification between height, weight, and age. This is a part 
cross-classification table. This type of table is sufficient for many 
purposes and is widely used. Many of the tables in the United 
States censuses are of this type. They are also useful in business 
reports and in research work. 
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PERCENTAGE COMPARISON TABLES 

Percentages are the most generally used method of comparison 
in the business world, if not in all life’s activities. The simple 
decimal system of tens and hundreds long ago became so funda- 
mental a part of our daily methods of thinking that it may truly 
be said that all literate persons usually make comparisons on the 
basis of 100. This fact makes the 'percentage comparison table one 
of the most useful in statistical presentation. It is especially 
useful in comparing values which differ greatly in size, such as a 
large store and a small store, a whole industry and a single plant 
in that industry, one department and an entire plant, exports 
and imports, and other such divergent values. 

I. Pure Percentage Comparisons 
TABLE 10 

COMPAEATIVE EXPENDITURES OF AmERICa’S ThREE LaRGEST CiTIES 


Percentage That Expenditures for Public Safety, Highways, Sanitation, 
Hospitals, Schools, Libraries, and Recreation Are of Total Expenditures 
of New York, Chicago, and Philadelphia in 1938 * 


Type of Expenditure 

New York 

Chicago 

Philadelphia 

Totals 

100.00% 

100.00% 

100.00% 

Public Safety 

25.93 

26.42 

29.23 

Highways 

6.46 

8.92 

4.29 

Sanitation 1 

9.63 

! 8.82 

6.16 

Hospitals 

8.54 

5.65 

7.13 

Schools 

45.01 

41.15 

48.04 

Libraries 

0.87 

1.32 

1.19 

Recreation 

3.56 

7.72 

3.96 


* Source: Financial Statistics of Cities of Over 100,000 Population 1938. 
Table 17, pp. 164-167, Bureau of Census. 

Percentage comparisons such as are given in Table 10 are easy 
to understand and are excellent for comparative analysis. Al- 
though New York is twice as large as Chicago and is three times 
the size of Philadelphia, one can see at a glance that Philadelphia 
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spends relatively more on 'public safely and on schools than its 
sister cities. New York spends relatively more on sanitation and 
Philadelphia more on hbraries. Excellent as this type of table is 
for comparisons of economic, sociological, political, and other data, 
it has the one major defect that it leaves out the original data on 
which the percentages are based. In some cases this defect is of 
no serious consequence. In other cases a review of the absolute 
values is very necessary. Both the original data and the per- 
centages may be combined in one table. 


II. Mixed Percentage Comparisons 
TABLE 11 

CoMPAEATivE Expendituees OF Ameeica^s Theee Laegest Cities 


Expenditures and Percentage That Expenditures for Public Safety, 
Highways, Sanitation, Hospitals, Schools, Libraries, and Recreation Are 
of Total Expenditures of New York, Chicago, and Philadelphia in 1938 * 



Expenditures (in $1,000) 

Percentages of Expenditures 

Type of 
Expenditure 

New 

York 

Chicago 

Phila- 

delphia 

New 

York 

Chicago 

Phila- 

delphia 

Totals 

$357,835 

$114,961 

$57,546 

100.00% 

100.00% 

100.00% 

Public Safety 

92,789 

30,383 

16,820 

25.93 

26.42 

29.23 

Highways 

23,095 

10,245 

2,468 

6.46 

8.92 

4.29 

Sanitation 

34,446 

10,150 

3,548 

9.63 

8.82 

6.16 

Hospitals 

30,576 

6,499 

4,100 

8.54 

5.65 

7.13 

Schools 

161,970 

47,299 

27,648 

45.01 

41.15 

48.04 

Libraries 

3,123 

1,522 

684 

0.87 

1.32 

1.19 

Recreation 

12,731 

8,863 

2,278 

3.56 

7.72 

3.96 


* Source: Financial Statistics of Cities of Over 100,000 Population 1938. 
Table 17, pp. 164-167, Bureau of Census. 


Table 11 is one of the most useful forms for comparing balance 
sheets, profit and loss statements, sales, production, costs, ex- 
penses, population, purchasing power, and other such important 
factors in business management or general statistical analysis. 
The reason for placing all the data, or absolute numbers, on one 
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side of the table and all the percentage values on the other side is 
that this method places the items to be compared in the strongest 
positions of contrast, percentages near percentages, and absolute 
numbers next to each other. If it is desired, the absolute figures 
and the percentages for each city could be placed in adjacent 
columns. Such tables are frequently used. 

FREQUENCY TABLES 

In Chapter 6, Worksheets Nos. 2-9 are frequency tables. The 
student is asked to review the technique and methods explained 
in Chapter 6 for the construction of such tables. They have 
a wide use in statistics as the basis of further analysis. It is 
unnecessary to repeat at this point the long and detailed explana- 
tions of the principles that should be followed in the construction 
of such tables. The reason that they are called worksheets in- 
stead of tables in the body of the text is that their form is dictated 
and severely limited by the nature of the mathematical computa- 
tions which are based upon them. A frequency table should always 
be constructed in conformity with the five basic principles ex- 
plained in Chapter 6 and the limitations which the mathematical 
manipulation of the data require. They are a highly specialized 
type of table, simple in form but based on narrow and exact 
principles of statistical analysis which must be followed strictly 
if dependable results are to be obtained. At this point Chapter 6 
should be reviewed. 


CUMULATIVE TABLES 

A cumulative table is one which carries a value or series of 
values through a set of computations and analysis to some specific 
summation which either increases or decreases the initial figures 
as the processes of the analysis may require. In its simplest form 
it is a successive cumulation or decrease of quantities. The in- 
creasing cumulation measures the total number which has come 
into existence at the end of each succeeding class ; the decreasing 
cumulation measures the number remaining in existence at the end 
of each class. 
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I. Cumulative Frequency Table 

TABLE 12 

Frequencies and Increasing and Decreasing Cumulation 

OP Frequencies of Heights of Grade School 

Children, Stillwater, Oklahoma, 1939 * 

Class Intervals 
Heights in Inches 

Frequencies 
No. of Children 

Increasing Cumu- 
lation of 

Decreasing 
Cumulation of 

in Each Class 

Frequencies 

Frequencies 

40.5-42.4 

1 

1 

106 

42.5-44.4 

4 

5 

105 

44.5-46.4 

2 

7 

101 

46.5-48.4 

18 

25 

99 

48.5-50.4 

13 

38 

81 

50.5-52.4 

14 

52 

68 

52.5-54.4 

23 

75 

54 

54.5-56.4 

10 

85 

31 

56.5-58.4 

7 

92 

21 

58.5-60.4 

10 

102 

14 

60.5-62.4 

3 

105 

4 

62.5-64.4 

0 

105 

1 

64.5-66.4 

1 

106 

1 

Totals 

106 




* Source: Worksheet No. 7, Chapter 6, 


An analysis such as may be made by Table 12 is very useful in 
revealing the number of railway ties, telephone poles, or other 
fixed capital goods that will be standing or in use at the end of 
successive periods of time. It is also used in making mortality 
tables for insurance companies. It can be made to apply to any 
situation in which there is a changing number of items in suc- 
cessive classes or at succeeding periods of time. 


II. Special Cumulative Tables 

ITie best examples of special purpose cumulative tables are the 
accounting Profit and Loss Statement and the periodic Balance 
Sheet. In the typical profit and loss statement one begins with 




CUMULATIVE TABLES 


135 


(1) total sales, from which is deducted (2) cost of sales, which 
gives (3) gross returns from sales. From this is subtracted (4) 
sales expenses which gives (5) net profits from sales. 


TABLE 13 

Consolidated Income Account of Mid-Continent Petroleum 
Corporation, for Year Ending December 31, 1940* 


Sales $37,876,796 

Cost of Sales 25,579,552 

Gross Profit from Sales 12,297,244 

Selling, general, and administrative expenses 6,026,257 
Gross Profit 6,270,987 

Other Income 1,145,146 

Total Income 7,416,134 


Depreciation $2,193,403 

Depletion 817,230 

Leaseholds abandoned 1,073,828 

Federal and State Taxes 364,278 


Total deductions from Total Income 
Net Income 
Dividends 
Surplus for Year 


4,448,740 

2,967,394 

1,488,251 

$1,479,143 


* Source: Moody’s Manual of Investments, Industrials, 1941, 

p. 1282. 

The ordinary bank statement and the commercial statement are 
samples of cumulative tables. (See page 137.) 

Among other tables of this type are the perpetual inventory 
card, bills of lading, individual accounts, installment purchases 
accounts, income tax forms, and many other tables, forms, state- 
ments, and accounts universally employed in the business world. 
An intimate acquaintance with them is an essential part of a 
business education and a mastery of business statistics. 

The five types of tables analyzed in this chapter cover all the 
essential points in tabulation. The few types, however, may be 
varied and combined into an almost endless variety of tables 
which are suited to every kind and degree of statistical presentation. 
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TABLE 14 


Consolidated Balance Sheet of Mid-Continent 
Petroleum Corporation, December 31, 1940 * 


Assets 

Refined and crude products 

$ 8,793,126 

Materials and supplies 

1,280,368 

Notes and accounts receivable 

2,763,090 

Cash 

8,913,655 

U.S. Treasury obligations 

3,252,369 

Total current assets 

S25,002,608 

Oil reserves, leaseholds, etc. 

20,343,665 

Plant and equipment 

16,730,859 

Investments (cost) 

3,356,696 

Deferred debt items 

459,012 

Total assets 

$65,892,841 

Liabilities 

Accounts payable 

$ 2,802,611 

Accrued taxes 

650,428 

Total current liabilities 

$ 3,453,039 

Common stock 

18,579,120 

Reserves for contingencies 

1,156,564 

Minority interest 

76,448 

Capital surplus 

20,448,203 

Surplus from operations 

22,179,467 

Total liabilities 

$65,892,841 

Net current assets 

21,549,569 


* Source: Moody’s Manual of Investments^ Industrials, 
1941, p. 1282. 


The purpose of the present chapter is to describe the various 
types of tables and the technique of their construction. Chapter 
13 will indicate how these various types of tables may be employed 
in statistical analysis. This kind of analysis is widely used in all 
types of research work and in business reports and should be mas- 
tered thoroughly by the student. Without such mastery he will 
find himself handicapped at many points in his later work. 
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TABLE 15 
Mr. John Doe 

Statement op Account with the 
First National Bank, Centerville, New York 
January, 1942 * 


Checks 

Deposits 

Date 

Balance 

$ 25.50 


December 31, 1941 
January 2, 1942 

$325.54 

300.04 

13.43 


January 

2, 1942 

286.61 

4.80 


January 

2, 1942 

281.81 

.56 


January 

3, 1942 

281.25 

15.00 


January 

3, 1942 

266.25 

3.51 


January 

5, 1942 

262.74 

12.40 


January 

5, 1942 

250.34 

40.00 

$250.00 

201.00 

January 

9, 1942 

661.34 

7.50 


January 

12, 1942 

653.84 

587.50 


January 

12, 1942 

66.34 

9.31 


January 

15, 1942 

57.03 

10.00 


January 

19, 1942 

47.03 

5.82 


January 

19, 1942 

41.21 

15.00 


January 

21, 1942 

26.21 

10.00 


January 

22, 1942 

16.21 

4.95 


January 

27, 1942 

11.26 

10.00 


January 

30, 1942 

1.26 


* Source: Accounts, First National Bank, Centerville, 
New York, February 1, 1942. 


SUMMARY 

1. Tabulation is an orderly arrangement of data. 

2. Tabulations are usually based on one or more of the following four 
bases: (1) Time sequence, (2) Spatial sequence, or location, (3) Quality, 
or (4) Quantity. 

3. Captions are the designations or names of the columns at the top 
of a table. 

4. Stubs are the designations or names of the lines at the left side of 
a table. 

5. A complete cross-classification table is one in which the descriptive 
captions and stubs for any data may be read from one point in the table. 
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All information on these particular data is available from the point of its 
location in the table. 

6. The point of greatest emphasis in a table is the point at which the 
values to be compared are closest together. This is a vertical contiguous 
position, or one directly above the other. 

7. All tables should have brief descriptive titles and numbers for 
identification or reference. 

8. All tables should have a statement, usually at the bottom, indicating 
the source of the data tabulated. 

9. Tables are the most extensively used method of organizing and pre- 
senting data either in the press or in statistical studies. 
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REVIEW QUESTIONS 

1. What is tabulation? 

2. What are the logical bases of tabulation? What are the charac- 
teristics of each? 

3. What considerations should determine the basis of tabulation in 
any particular table? 

4. Why is it better to classify tables according to form instead of use 
and purpose? Explain. 

5. What are the five principal forms into which tables may be clas- 
sified? 

6. What is required to make a complete cross-classification table? 

7. How may one check a table to determine whether it is a complete 
cross-classification table? Explain. 

8. What are the essential parts of a table? 

9. What are captions, stubs, columns, and rows? Explain fully. 

10. What is meant by positions of emphasis? What are the four 
strongest ones in the order of decreasing strength? Why? Explain fully. 
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11. What information should the title of a table give? Why? 

12. When more than two classes are used in one complete cross-classifi- 
cation table, where must the third or fourth classes be placed? Explain. 

13. What are the principal uses of percentage tables? Explain fully. 

14. Name and explain the uses of four cumulative tables. 

15. What are the principles on which frequency tables should be con- 
structed? Explain. 

16. What are the prime essentials in producing an excellent table? 


EXERCISES 

No. 1. Data for Complete Cross-Classification Tables 
Thirty students enrolled in A. and M. College during the year 1933“34 
Key: 

Column 1 — Class. Fr = Freshman, So = Sophomore, Jr = Junior, 
Sr = Senior. 

Column 2 — Schools. Ag = Agriculture, Co = Commerce, Ed = Edu- 
cation, En = Engineering, HE = Home Economics, SL = Science 
and Literature. 

Column S — Sex. M = Male, F = Female. 

Column 4 — The age of students. 


I 

II 

III 

IV 

I 

II 

III 

IV 

I 

II 

III 

IV 

Jr 

Co 

F 

20 

Fr 

HE 

F 

22 

Fr 

En 

M 

21 

Sr 

Ag 

M 

25 

So 

En 

M 

19 

Jr 

HE 

F 

21 

Fr 

Ag 

M 

20 

Sr 

En 

M 

25 

So 

Ed 

F 

19 

Fr 

Co 

M 

20 

So 

Co 

F 

19 

So 

Co 

M 

19 

Fr 

Co 

F 

18 

Jr 

En 

M 

27 

Fr 

En 

M 

24 

Fr 

SL 

F 

21 

So 

SL 

F 

19 

Sr 

SL 

M 

23 

Fr 

Ed 

M 

19 

Sr 

SL 

M 

21 

Sr 

HE 

F 

23 

So 

Ed 

F 

21 

Fr 

SL 

F 

21 

So 

Ag 

M 

22 

So 

En 

M 

21 

Jr 

Co 

M 

22 

Fr 

Co 

M 

20 

Fr 

Ag 

M 

18 

Fr 

Co 

M 

23 

Fr 

En 

M 

24 


(a) Make a one-way table with IV. 

(b) Make a two-way table with III and IV. 

(c) Make a three-way table with I, III, and IV. 

(d) Make a four-way table. 
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No. 2. Data for Cross-Classification Tables 


Age, Weight, Height, Am > Sex of Geade School Childeen 


Age 

Years 

Weight 

Pounds 

Height 

Inches 

Sex 

Age 

Years 

Weight 

Pounds 

Height 

Inches 

Sex 

9 

61 

60 

F 

12 

112 

64 

M 

7 

55 

57 

M 

13 

no 

68 

M 

12 

98 

67 

F 

10 

98 

63 

F 

6 

52 

48 

F 

9 

88 

61 

F 

6 

56 

51 

M 

7 

73 

58 

M 

8 

63 

59 

F 

6 

54 

52 

F 

10 

85 

62 

F 

11 

93 

64 

M 

11 

91 

65 

M 

8 

87 

60 

F 

10 

79 

58 

F 

13 

120 

69 

M 

7 

62 

49 

M 

9 

96 

62 

F 


(a) Make a one-way table with Age. 

(b) Make a two-way table with Weight and Height. 

(c) Make a three-way table with Sex, Age, and Height. 

(d) Make a four-way table. 
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GRAPHIC PRESENTATION 


Psychologists long ago proved that the eye takes in much more 
information than the ear. The little child is always saying, '^Let 
me see.^^ The student is conscious that charts and graphs reveal 
ideas much more clearly than do tables and formulas. Graphic 
presentation is a powerful device to make statistical ideas clearer 
not only to the statistician in his analysis, but also to those who 
must interpret and use his findings and conclusions. 

Of the many available methods of making graphs we shall limit 
ourselves to six. These are: 

1. Line curves or graphs 4. Pie diagrams 

2. Bar charts 5. Statistical maps 

3. Band charts 6. Pictographs 

The simplest to understand, the easiest to make, the most 
variable, and the most widely used type of chart is the line graph. 
It enables one to present more information of a more complex 
nature in a perfectly understandable form than any other kind 
of chart. It requires the least technical skill to make. 

BASIC REQUIREMENTS FOR LINE GRAPHS 

1. In such graphs, the independent variable is always placed on 
the X-axis and the dependent variable on the F-axis. If time is 
one of the variables, it is always placed on the X-axis. 

2. Quantity or value is measured on the Y-axis. The scale of 
the Y-axis should always begin at zero (0), even if the lowest 
figure for any X period is far above zero. It is deceptive to begin 
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the F-scale at a point above zero without clearly indicating this 
fact as is done in Fig. 12, in which the broken space indicates that 
a part of the scale has been omitted, or in Fig. 13, which shows 
that part of the scale is lacking. If space does not make it neces- 
sary to cut the F-axis short, it is always better to show it in 
full as is done in Fig, 11 showing steel production. 


Millions 
of Tons 



Fig. 11. Steel production in U.S., 1929-1941 inclusive. 
(Steel Facts, American Iron and Steel Institute, May, 1942) 


3. In a line graph in which time is shown on the X-axis, it is 
necessary to plot the point of data for each time period at the 
middle of the space allotted to that time period as is done in Figs. 
11, 12, 13, when the F-values are averages or totals for that 
period. 

4. If the F-quantities are to show cumulative values in which the 
quantity for each succeeding period is added to the preceding one, 
the point is plotted at the right-hand edge of the space allotted to 
the period, as is shown in the cumulative line in Fig. 14 on auto- 
mobile production. This is necessary because the entire quantity 
is not obtained until one reaches the end of the period in which it 
is being accumulated. 

5. All graphs, like all tables, should have a title and number 
and should give the source of the data. 
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Millions 
of Bushels 



Fig. 12. Corn production in U.S., 1923-1935. {Agricultural 
Statistics, 1936, p. 33) 

Millions 
of Bushels 



Fig. 13. Wheat production in XJ.S. and in Canada, 1923- 
1935. {Agricultural Statistics, 1936, p. 11) 

6. If more than one line is shown on a graph, the meaning of 
each line should be clearly designated by a legend or by writing 
in the character of the line in the body of the graph as is done in 
Fig. 13 showing wheat production in the United States and Canada. 

The Z chart, of which Fig. 14 is an example, shows three values, 
(1) the monthly production measured by the scale on the right- 
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hand end of the chart, (2) the cumulative monthly production, 
and (3) the twelve-month moving total, or the total produced 
during the past twelve months. The cumulative monthly pro- 
duction and the twelve-month moving total are measured on the 
left-hand F-scale. This type of graph is useful in the management 
of production and sales of either goods or services. 

Thousands Thousands 

of Cars of Cars 



Fig. 14. Actual and cumulative monthly production and 12- 
month moving total of automobile production in U.S., 1940. 
(Standard and Poor’s) 


One of the great advantages of the line graph is that several 
different series of data, from five to ten series in many cases, can 
be shown on the same chart without confusing the reader in the 
least. No other statistical chart can carry so heavy a load without 
hopelessly mixing the data. Fig. 15 carries five series of rice 
production data, and the measure of each item is clear. The 
same may be said of Graph 17 which gives the production of 
petroleum for five leading states. 

Figures 15 and 16 should be studied together. Figure 15 shows 
the production of rice in millions of tons in India, Japan, Java, 
the Philippine Islands, and the United States as measured on the 
natural or absolute scale. The absolute production and absolute 
variations in production are shown. On such a scale the large 
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Millions 
of Tons 



Fig. 15. Rice production in India, Japan, Java, the Philip- 
pines, and the U.S., 1923-1935, Arithmetic Scale. {U.S. Year- 
book of Agriculture, 1936, p. 69) 


producers, India and Japan, show wide fluctuations, while the 
three smaller producers show little change from year to year. 
Figure 16 is made on semi-logarithmic paper. It has the natural 
scale on the X-axis but has a ratio, percentage, or logarithmic 
scale on the F-axis. This type of chart reduces the variations of 
all the series to the same percentage base. In fact, it shows that 
India actually has the least percentage of fluctuation in rice pro* 
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* 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 

Fig. 16. Rice production in India, Japan, Java, the Philip- 
pines, and the U.S., 192^1935, Logarithmic Scale. (U,S. 
Yearbook of Agriculture^ 1936, p. 69) 

duction of any of the five countries and that the United States 
and the Philippines have the greatest year-to-year changes. This 
percentage change cannot be shown on the natural scale. The 
ratio chart or semi-logarithmic graph is of great value in the 
measurement of cyclical and seasonal fluctuations in sales, pro- 
duction, employment, and other business activities. It is es- 
pecially useful in showing the percentage changes of small and 
large departments in the same store or factory, and of revealing 
the ratio fluctuations of small and large plants in the same in- 
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dustry. It may also be used with sociological, psychological, 
and educational data. 

Principles of Construction. 1. Normal or absolute scale 
charts give equal spaces to equal absolute values. This means 
that an increase or decrease of 10 would receive the same space 
on the graph whether the 10 were added to 100 or to 1,000 or to 
10,000. If the chart were drawn to give 10 a space of .1 of an 
inch, it would receive this space, regardless of whether it fell at 
the top or bottom of the chart. 

2. Ratio or semi-logarithmic charts give equal spaces to equal 
PERCENTAGE CHANGES, regardless of where they fall on the graph. 
This means that a 10% increase in quantity is given the same 
space on the chart whether it occurs at the bottom or at the top. 
If a 100% increase is allotted one inch space on a chart, a 50% 
increase receives one-half inch, and a 10% increase, one-tenth of 
an inch. Ten added to ten is a 100% increase, but 10 added to 
20 is only a 50% increase and would receive only one-half as 
much space, while 10 added to 100 is only a 10% change and would 
receive only one-tenth as much space. This means that the 


Scale Percentage Changes 


100 


One added to 1 is a 100% increase. 

90 

12.5 

Ten added to 10 is a 100% increase. 

80 

14f 

Both have the same space on the chart. 

70 

16§ 


60 

20 

One added to 2 is a 50% increase. 

50 

25 

Ten added to 20 is a 50% increase. 

40 

m 

Both changes have the same space on 

30 

50 

the chart which is one-half of the space 

20 

100 

given to 100% changes. 

10 

llj- 


9 

12.5 

All the other spaces are in propor- 

8 

14f 

tion to the size of the perceni;age 

7 

16f 

changes. 

6 

20 


5 

25 


4 



3 

50 


2 

1 

100 
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higher one goes on the chart, the larger must be the absolute in- 
crease to make the same percentage increase. At the bottom of 
the chart, 5 added to 10 is a 50% increase, but higher on the graph, 
500 must be added to 1,000 to make a 50% change, and still 
higher, 50,000 must be added to 100,000 to make the same per- 
centage change. Equal 'percentage changes receive equal space on a 
ratio chart. 

The scale runs from the bottom to the top of the chart on 
page 147. 

It should be noted that on a line chart the several lines show- 
ing the several variables may cross each other without giving 
the least confusion to the reader if the adjacent lines are drawn 
in a different script or color. It is preferable to draw solid lines 
unless they touch or cross at an angle that would cause the reader 
to lose their identity. In such cases the following variations^ may 
be used to insure clarity: 

1 . 

2 . 

3 . 

4. 

5 

The student can readily devise other types of lines if more 
variation is necessary. 

Figures 17 and 18, showing the production of petroleum for 
Texas, California, Oklahoma, Kansas, and Louisiana for the years 
1922 to 1934 inclusive, indicate the relative advantages of the 
two types of line graphs. 

Percentage, ratio, or semi-logarithmic graphs at the present 
time are widely used by businessmen and research workers in the 
fields of economics, political science, sociology, and psychology. 
The student should become thoroughly familiar with their con- 
struction and use. It is often more important to know whether 
sales have increased or decreased 10% than it is to know that the 
change is $5,000. What percentage increase in sales resulted 
from an advertising expenditure of $10,000? What has been the 
percentage decrease in the death rate? What percent of freshmen 
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Fig. 17. Petroleum production in Texas, Oklahoma, California, 
Kansas, and Louisiana, 1922-1934, Arithmetic Scale. (Petro- 
leum Facts and Figures, American Petroleum Institute, 1937) 

Millions 



Fig. 18. Petroleum production in Texas, Oklahoma, Cali- 
fornia, Kansas, and Louisiana, 1922-1934, Logarithmic Scale. 
(Petroleum Facts and Figures, American Petroleum Institute, 
1937) 
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finally graduate from college? We live by means of comparisons 
in a world of variables, and the rate of change is an essential 
part of the information necessary to wise decisions. The semi- 
logarithmic chart is a ready device to this end. 


BAR CHARTS 

The second most widely used type of graph is the bar chart. 
It is more striking to the eye and arrests the attention of the 
reader with more force than the line graph, but it is much more 
limited in its possible variations and in the load of data it can 
carry without confusing the reader. Bar charts should preferably 
be limited to single bars for purposes of clarity as is shown in 
Fig. 19, giving the population of leading Ohio cities for 1940. 
In this chart there is one bar for each city. The data given are 
quite limited, but their relative sizes are so strikingly clear that 
the idea is obtained at one glance. Such graphs are especially 
^effective for the popular presentation of simple relationships. 

Hundreds of Thousands 

0123456789 


Cleveland 


Cincinnati 


Columbus 


Toledo 


Fig. 19. Population of four principal Ohio Cities, 1940. 

Single bar graph. (U.S. Decennial Census^ 1940) 

In Pig. 20 the double bar chart is employed. It makes a com- 
parison for each variable between two dates or areas. This chart 
shows the population of leading Ohio cities for 1940 and 1930. 
The double bar chart is nearly as easy to read as is the single bar 
and it has the advantage of presenting twice as much data. The 
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Hundreds of Thousands 



Fig. 20. Population of four principal Ohio cities, 1940 and 
1930. Double bar graph 


two sections of the double bar should be so clearly distinguished 
that no confusion could result. 

In Fig. 21, the triple bar chart is used. It is much more difficult 
to read than is the single or double bar, but it gives three com- 
parisons. If it is used, the segments of the bars should be S 0 ‘ 
clearly distinguishable in cross-hatch or color that they can be 
read quickly and that no confusion will result. 

In any ordinary circumstance a triple bar is as complex a bar 
chart as should ever be used. Bars with four or five segments 
result more in confusion than they do in clarity. If there are 
more than two or at most three variables to present, they should 
be charted in a line graph, or in separate single bar charts. 

Time is placed on the F-axis in bar charts as is done in Figs. 
19-21, but usually if time is to be emphasized as the independent 
variable, the chart becomes a vertical bar chart with time on the 
X-axis. Figure 22 measuring steel production from 1929 to 1941 
inclusive, is such a chart, designed to show the fluctuations of 
steel production through succeeding periods of prosperity, de- 
pression, and war activity. This information could more easily 
be shown on a line graph, but the bar chart presents the facts ir 


Hundreds of Thousands 


Cleveland 


Cincinnati 


Columbus 


Toledo 


Fig. 21. Population of four principal Ohio cities, 1940, 1930 
and 1920. Triple bar graph 


IVlillions 
■of Tons 



'Fig. 22. Production of steel in TJ.S. 1929-1941. Vertical bar graph. 
(Steel Facts, Iron and Steel Institute, May, 1942) 
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a more glaring and forceful form. Such a graph is an excellent 
way of emphasizing a single time series. 

The double vertical bar chart is frequently used to compare 
two closely related variables through the same time periods. 
Figure 23, showdng the population of New York State and Penn- 
sylvania from 1880 to 1940 inclusive, is a good example of this 


Population 
in Millions 



Fig. 23. Population of New York and Pennsylvania, 1880-1940. 

{U.S. Decennial Census, 1940) 

chart. This data could be shown on a line graph, but the double 
bars emphasize the relationships more clearly. It is not ad- 
visable to make vertical bar charts more complex than double 
bar graphs. Triple and quadruple bars are too confusing to the 
reader to be practical. For more than two variables on the same 
time graph, line charts should be used. 

For pure scientific analysis and in all other eases in which no 
psychological or emotional appeal is desired, the simple line graph 
is preferable. But in all cases in which the mind of the reader 
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must be struck or challenged with an idea, the bar chart is more 
effective. 

In making both vertical and horizontal bar charts the number of 
guide lines from the quantity scale across the graph should be 
only sufficiently numerous to give the approximate length of the 
bars. It spoils a bar chart to have it covered with guide lines so 
close together that they obscure the bars. The bars must stand 
out, or the bar chart loses its force and value. In the horizontal 
bar chart, if the lower bars are shorter, the guide lines do not 
need to extend to the bottom of the graph. 

In making bar charts the numbers represented by the bars 
should never be written at the top or right-hand end of the bars. 
Such a practice obscures the length of the bars and deceives the 
reader as to their actual and relative length. The figures or nu- 
merical values represented by the bars should be written either, 

1. inside the hollow bars, as is shown in Fig. 24, 



2. at the base end of the bar on the opposite side of the base 
line, as is shown in Fig. 25, 
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New York 4,000 
Chicago 3,000 
Detroit 2,0001 


or, 



1927 1928 1929 


Fig. 25. Correct forms for bar charts 


3. or in an adjoining table. 
Never write the values as, 


0 


5,000 


3,000 


Fig. 26. Incorrect location of num° 
hers on a bar chart 


Frequency Polygons and Histograms. Frequency dis- 
tributions are best pictured in frequency polygons or in histo- 
grams. These were given and explained in Chapter 6. Figures 
3-6 are histograms showing the frequencies in each class interval 
in the sample. Figures 7-10 are frequency polygons showing the 
same information as the histograms. The student should review 
these specialized types of line and bar charts and note the method 
of their construction. They are timeless in that they measure 
varying quantities at one point or instant of time. 
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BAND CHARTS 


A band chart is a type of line graph which shows the total for 
successive time periods broken up into sub-totals for each of the 


Millions 
of Dollars 



Fig. 27. Federal, state and local taxes of a group of 
major steel companies, 1930-1940. {Steel FactSy 
American Iron and Steel Institute, May, 1941) 



Fig. 28. Industrial production of minerals, durable and non- 
durable goods, 1928-1940. {The Cleveland Trust Company 
Business Bulletin ^ April 15, 1942) 
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component parts of the total. The spaces between the successive 
lines are filled in with cross-hatch or color to emphasize the quan- 
tity of each factor for each time period. This type of graph is 
especially useful in dividing total costs into component costs, 
total sales into department or district or individual salesman’s 
sales, total production by states, plants, or industries, and other 
such relations. 



Percentage of Population 


Fig. 29. Lorenz Curve. Uniformity of density 
of population in New Jersey in 1940. {US. De- 
cennial Census) 

The more curved the data line is the more unequal is the dis- 
tribution of the series. The Lorenz Curve shown in Fig. 29 is 
quite useful for showing the degree to which one variable is evenly 
or unevenly distributed in its relation to another variable in terms 
of percentages. This type of chart is useful in presenting the 
distribution of population over an area, the distribution of in- 
come in a population, and similar relationships. 
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Thousands 
of Dollars 



Fig. 30. Cost of sales, sales expense, total sales, and 
profits, A. B. Hardware store 


PIE DIAGRAMS 

A pie chart is a device for reducing a percentage total of 100 to 
its component parts. The diagram consists of a circle the cir- 



Fig. 31. Population of New Eng- 
land States as percentages of total 
population of all New England, 
1940. {U,S. Decennial Census) 



Fig. 32. Percentages of total pe- 
troleum production by states, 1936. 
{Petroleum Facts and Figures^ 
American Petroleum Institute) 
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cumference of which is divided into 100 parts. Each component 
part of the total is given its percentage part of the total. This 
type of chart is useful wherever there is need for picturing the 
relative size of the component parts of a whole. Its adaptations 

1925 1936 

0 0 



Fig. 33. Amount of gasoline dollar spent for taxes, 1926 and 1936. 
(Petroleum Facts and Figures, 1937, and Current Price Quotations) 


are almost limitless. It may show total sales by departments, 
states, districts, salesmen, or branches; production by depart- 
ments, plants, states, or nations; total costs by items; etc. 

In making percentage pie charts the following points should be 
carefully observed: 

1. Divide the circumference of the circle into 100 spaces, each 
equal to 1%. 

2. Reduce the data to be plotted to percentages for each class 
based on the total of all classes as 100%. 

3. Locate zero (0) percentage at the top of the circle chart and 
measure to the right for increasing percentages, coming around to 
the top again for 100%. 

4. Write in each segment of the chart its class name or indi- 
cate the meaning cf each segment in an accompanying legend. 

5. For comparisons between two or more time periods or be- 
tween two or more areas or classes, use separate charts for each one. 

6. The largest items in the series should be placed first and the 
others should follow in declining order of size, with the smallest 
or ^^All Others last. 




PLUS AND MINUS CHARTS 



Fig. 34. Trend of Iowa busi- 
ness. (Iowa Business Digest, 
July 31, 1942) 


Fig. 35. Change in automobile 
registrations by states between 
the prosperous year of 1929 and 
the depression year of 1930. 
{Cleveland Trust Company Busi- 
ness Bulletin, May 15, 1931) 
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STATISTICAL MAPS 

Most data bear some relationship to space. They may be identi- 
fied with some area or location. The statistical map is a widely 
used method for the presentation of data. Such maps may be 
made by the (1) cross-hatch method or (2) the dot method. 
Cross-hatch maps are used to make average or total comparisons 
between relatively large areas. 

Cross-hatcli Maps. The error in such maps is that a large 
area, a state for instance, is all given the same density or rating 



Fig. 36. Farm population of Texas and Oklahoma, 1930. {U.S. Census 
of Agriculture, 1930) 
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while there may actually be a great deal of variation within its 
boundaries. Kansas and Nebraska both have a dense popula- 
tion in the east and a very sparse settlement in the west. The 
disparity occurs in California, Arizona, and even in New York. 
Cross-hatch maps should always be understood as comparing 
average or total figures between areas without regard to internal 
variations. 

Small Dot Maps. The type of dot map showm in Fig. 36 is 
useful in measuring the variations or graduations of any kind in a 
large area. Population, sales, wealth, production, or any other con- 
tinuous variable may be so measured- The small dots are placed 
as near as possible in the exact location on the map where the 
population or activity exists. Such maps are excellent for de- 
tailed analysis. 

Rules for Making Small Dot Maps. 1. Decide how many 
units of data should be represented by one dot. 

2. Have the number of units of data equal to one dot small 
enough that the dots will just cover the densest portion of the 
map solidly. 

3. Scatter the dots promiscuously over the smallest unit for 
which the data are given. 

Large Dot Map. Large dot maps merely indicate the amount 
of population or other data in an area without exactly locating 
them in the area. Figure 38 shows that Texas has about 6.4 
millions of people, but does not show that more than three-fourths 
of this population is located in the eastern half of the state. The 
map for Arizona does not show that one-third of the state’s people 
live in the Salt River Valley. Such maps are useful only for 
making large area or total comparisons. Sometimes this is the 
only type of comparison needed. Unless some detailed use is to 
be made of county, city, or minor political subdivision data, it is a 
w^aste of time to make a small dot map. The large dot map serves 
the purpose just as well and takes much less time and expense to 
produce. 

Pin Head Maps. The pin head map, constructed by sticking 
thumb tacks or pins with heads of different colors in specific lo- 
cations on a map to indicate some specific activity or data, is a 





Pig. 38. Density of population of the U.S. showing the amount of 
population in each geographic subdivision without exactly locating the 
data. iU,S. Decennial Census, 1940) 



Fig. 39. Location of institutions of higher education in Oklahoma, 1940 
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very useful device for business management purposes or for or- 
ganization controls. 

The methods of graphic presentation listed above include all 
the variations in charts likely to be used by the average statis- 
tician or ordinary business concern. If the student needs other 
variations of these basic methods or more complex patterns, he 
should consult the excellent and complete work, Graphic Presen- 
tation by Willard C. Brinton, published by Brinton Associates, 
New York City, 1939, or one of the references given at the end of 
this chapter. 

Graphic presentation has become a fine art and deserves the 
careful study of all students of statistics whose research has 
produced something worthy of presenting to the public. It has 
also a wide use in business analysis and advertising. Even the 
mathematical research statistician finds it useful in clarifying his 
own ideas for further investigations. 


PICTOGRAPHS 


As the name indicates a pictograph is a combination of a graph 
and a picture. It is rather a graph made up of a number of pic- 
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Fig. 40. Comparison of military forces of Allies vs. Axis, 1939. {Every 
Week, February 12, 1943) 
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Fig. 42. Steel employment, payrolls, and production in 1939 and 194L 
{Steel Facts, No. 50, October, 1941) 



Fig. 43. Comparison of family earnings of automobile factory workers 
and all U.S. wage workers, 1935-1936. {Automobile Facts, No. 6, 
December, 1938) 
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tures. There has been a rapid expansion in their use in recent 
years. They are widely employed in advertising, in the daily 
press, in popular magazines, and in books designed for popular 
appeal. They add little or nothing to rigorous statistical analysis, 
but have educational value and popular appeal in the presenta- 
tion of statistical analysis. In the more serious and scientific 
statistical studies, they should be used conservatively if at all, 
but where popular appeal is desired, they are effective. 


SUMMARY 

1. Graphic presentation is the use of lines, bars, diagrams, maps, or 
pictures to illustrate the meaning and relationships revealed in the 
analysis of data. 

2. On the arithmetic or absolute scale, equal absolute numerical 
values are given equal spaces. 

3. On ratio or logarithmic scales equal spaces are given to equal per- 
centage values. 

4. Line graphs can carry more complex relationships or data without 
confusion than other types of charts. 

5. Bar charts should never contain more than triple compound bars 
and preferably never more than double bars. More than this leads to 
confusion. 

6. Pie or circle charts are usually employed to divide a total value 
into its percentage subdivisions. 

7. Pictographs or statistical graphs composed of pictures are em- 
ployed more for popular appeal than for scientific accuracy and are of 
special value in advertising, popular educational appeals or propaganda. 
The line graph is most widely employed in strictly scientific analyses. 

8. If numerical values are employed in bar charts they should never 
be written at the top or right-hand end of the bars. To do so tends to 
over magnify the length of the shorter bars. 

9. The scale on the 7-axis should always begin at a base of zero (0) 
in order to prevent distortion of the quantities presented in the graph. 

10. In making small dot maps the area of greatest density should ap- 
pear as a solid (patch of dots, the dots touching each other, in order to 
serve as an accurate base for comparison with less dense areas. 

11. Large dot maps do not give detailed locations of the data, but 
only the comparative quantities among larger areas. 

12. The type of chart or figure used should be determined by the 
nature of the data, the type of analysis, and the persons to whom appeal 
is made. 
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REVIEW QUESTIONS 

1. Why is graphic presentation necessary? 

2. Why should a graph always begin its quantity scale at zero? Ex- 
plain. 

3. What is meant by the ‘^natural or absolute scale ’^? For what 
should it be used? 

4. What is the percentage or ratio scale ’Y 

5. What is ^^semi-logarithmic” paper? For what is it used? Explain. 

6. Compare the equality as spaces on a ‘dog-chart” with a “natural- 
scale” chart. 

7. What are the principal advantages of the line graph over other 
charts? Explain in detail. 

8. What are the principal advantages and limitations of the bar 
chart? 

9. W'here should “time” usually be placed on a graph? Why? 

10. W^hat are the principal uses of the “Z-chart”? Explain in detail. 

11. Explain the uses of the “band chart.” Give illustrations of its 
uses. 

12. Explain the proper construction of a “pie chart” and illustrate 
its common uses. 

13. Name four kinds of statistical maps and explain the uses and 
limitations of each one. 

14. Of what use is graphic presentation to the research statistician? 

15. What are “histograms” and “frequency polygons”? For what 
are they used? How are they constructed? 

16. What is the purpose of graphic presentation in business statistics? 
Explain in detail. 
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EXERCISES 

1. Make one graph on the arithmetic scale and another on the loga- 
rithmic scale for the following data on the population growth of the 
three principal sections of the United States 1940 Census (in lOOO^s). 



United States 

North 

South 

West 

1890 

62,947 

39,817 

20,028 

3,102 

1900 

75,994 

47,379 

24,523 

4,091 

1910 

91,972 

55,757 

29,389 

6,826 

1920 

105,710 

63,681 

33,126 

8,903 

1930 

122,775 

73,021 

37,858 

11,896 

1940 

131,669 

76,120 

41,666 

13,883 


2. Make a horizontal single bar chart of the population of the three 
sections of North, South, and West for 1940. 

3. Make a vertical double bar chart of the population for North and 
South for 1910, 1920, 1930, and 1940. 

4. Make a pie-percentage chart for the three areas, North, South, and 
West for 1940. 




Part Two 

The Analysis of Large Samples 


CHAPTER 9 

AVERAGES 


An average is a representative or typical number which may be 
used to indicate the value of a large group of numbers. It is a 
device to aid the mind in grasping the central or true significance 
of a large aggregate of facts or measurements while freeing one- 
self from the confusing burden of details. If the mind could 
grasp all the details of a large volume of data, understand all 
their interrelations, and retain all the information at once, sta- 
tistics would be unnecessary. If we could keep in mind all the 
details of the ages, heights, and weights of a hundred children, 
the 3fields of ten thousand acres of corn, the wages of a million 
men, or prices of a thousand stocks, and the relationships which 
exist among them, we could dispense with averages and other 
statistics. The difficulty of keeping all details of the data in mind 
at once obscures our understanding of the central and logical re- 
lationships among the data. The selection of an average, or a 
central, typical representative number for the whole group re- 
moves these diflficulties. 

The reason an average is valid as a representative figure for a 
group of related measurements is that most samples of such data 
tend to pile up, or be concentrated, near the center of the group. 
In such a frequency distribution, items that differ to a small 
degree from the middle items are much more numerous than 
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those which differ to a large degree. Among a thousand men there 
might be one or two only four feet tall and one or two six and a 
half feet tall. There would be, however, perhaps eight hundred 
men between five and six feet tall, and probably five hundred 
between sixty-five and seventy inches tall. There may be only 
one adult in a million two feet tall, and one in ten million eight 
feet tall. As the deviations from normal increase, the number of 
items decrease; as the deviations from normal decrease, the number 
of items increase. It is this general tendency for related data to 
cluster or pile up about the normal that makes it possible to use 
one central, typical number to represent the entire group. Work- 
sheet No. 6 and Figs. 4 and 5 illustrate this tendency. It is basic 
in statistical analysis. 

It would be very difficult for us to reason abstractly, or to be at 
home in the modern world without averages. We think of the 
average man, the average student, average costs, income, ex- 
penses, height, weight, age, intelligence quotient, life expectancy, 
prices, wages, houses, yields, etc. All modern sciences from as- 
tronomy and physics to psychology and economics are based on 
averages. A valid comparison of two or more groups can be 
made only by means of the comparison of their averages. 

There are five kinds of averages in common use. Why have 
more than one kind? For the same reason that the automobile 
mechanic has a variety of wrenches for different cars or various 
parts of the same car. For the same reason that a dentist has 
various forceps, drills, and other equipment for teeth that are 
radically different in shape and size. Some types of averages are 
better for one kind of data; others are better for another, or the 
same data may be analyzed by various methods for different 
purposes. The five averages are the arithmetic mean, the geo- 
metric mean, the harmonic mean, the median, and the mode. 


ARITHMETIC MEAN 

The most widely known and used average is the arithmetic 
mean. In popular language it is always referred to as ^‘the 
average.^^ It is a computed average and takes into consideration 




ARITHMETIC MEAN 


173 


the two main characteristics of data, (a) the number of items, and 
(b) the size of the items. It is, therefore, a complete measure of 
data. Formulas number 2, 3, 4, and 5 provide for the calculation 
of the arithmetic mean under four slightly different conditions. 

Simple Summation 

Meaning of symbols: 

X or F = individual items of data 
X or F = arithmetic mean 

N = number of items of data 
2 (Sigma) = the sum of, or total 


Formula No. 2 

WORKSHEET NO. 12 


Z = 


2X 

N 



X 

7 

12 

8 
5 
9 

10 

_9 

60 


In this method the mean is found by simply adding up the values 
of the several individual items of data and then dividing that total 
by the number of items. It is especially suited to small samples. 
In samples of more than 100 items, and especially for samples 
that run into many hundreds or thousands of items, it is prefer- 
able to use class intervals. The student will recall that the prin- 
ciples and methods for setting up class intervals and constructing 
tally sheets are explained in Chapter 6. 

The arithmetic mean is suited to the averaging of absolute 
numbers. The mean is always stated in the same units as the 
original data, such as bushels, inches, tons, dollars, etc. 
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WORKSHEET NO. 13 


Computation of Arithmetic Mean of Wheat Yields 
ON 160 Farms in Hard Winter Wheat Area 
OF Kansas and Oklahoma, 1942 


21 

17 

18 

14 

16 

20 

17 

20 

14 

22 

15 

11 

17 

18 

20 

16 

15 

20 

13 

23 

15 

18 

15 

22 

25 

21 

22 

19 

11 

14 

21 

20 

19 

12 

13 

15 

14 

15 

16 

25 

14 

21 

19 

22 

11 

11 

12 

18 

21 

23 

18 

15 

13 

18 

22 

23 

20 

16 

24 

17 

31 

16 

16 

18 

18 

27 

9 

15 

15 

21 

17 

17 

17 

18 

13 

17 

16 

14 

17 

19 

22 

17 

19 

28 

20 

25 

27 

23 

15 

23 

13 

19 

9 

14 

16 

14 

18 

18 

12 

20 

11 

18 

22 

19 

16 

23 

12 

16 

14 

13 

20 

22 

22 

22 

22 

15 

15 

18 

18 

26 

18 

20 

14 

9 

30 

23 

19 

19 

17 

15 

17 

12 

19 

21 

20 

17 

15 

16 

14 

14 

22 

20 

25 

15 

26 

10 

11 

13 

16 

15 

14 

21 

18 

24 

21 

19 

19 

20 

17 

16 







Total 

2855 



- 'ZX 
^=iV 

_ 2855 
160 

= 17.84 bushels 




Long Method with Class Intervals 
Meaning of symbols: 

i = size or width of class intervals 
m = mid-point of class intervals, or class mark 
/ = frequency, or number of items in each class interval 
N = total number of items in data, or total of all fs 


Formula No. 3 
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WORKSHEET NO. 14 

Tally Sheet and Frequency Distribution of Wheat 
Yields on 160 Hard Winter Wheat Farms 

IN Kansas and Oklahoma, 1942 

Class Intervals 

Tally JMarks 

Frequency 

Bushels of Wheat 

There is one tally mark for each 

Distribution 

i 

item of original data 

/ 

8.5-10.4 

nil 

4 

10.5-12.4 

1445- 44nr 1 

11 

12.5-14 4 

AHt 44it JUrti- 

20 

14.5-16.4 

A4H- AHt 44Hr 44H 444i 1111 

29 

16.5-18.4 

iiH-JJrH: liVc 

30 

18.5-20.4 

Ua 1444: 14H 1144 4444 

25 

20.5-22.4 

444F44HH Jn44 44n: 1 

21 

22.5-24.4 

1144 nil 

9 

24.5-26.4 

4441 1 

6 

26.5-28.4 

111 

3 

28.5-30.4 

1 

1 

30.5-32.4 

1 

1 

N = 160 


Twelve class intervals were set up, each two bushels wide, so 
as to throw the data in each class as near the mid-point of the 
class as possible, following Rule Five on making frequency dis- 
tributions in Chapter 6. The data in Worksheet No. 13 were 
tallied in Worksheet No. 14 and the totals accumulated in the 
frequency column. This frequency distribution is quite smooth 
and normal. It shows a heavy concentration of data in the five 
classes between 12.5 and 22.4 bushel yields which together in- 
clude 125 of the 160 farms. The arithmetic mean for these data 
is computed by the three methods of (1) individual items. Work- 
sheet No. 13, (2) long method with frequency distribution. Work- 
sheet No. 15, (3) short method with frequency distribution, 
Worksheet No. 16. 

The computation illustrated by Worksheet No. 15 is called the 
long method because it usually runs into large numbers and totals. 
Since the mid-point of each class is multiplied by its frequency, 
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WORKSHEET NO. 15 

Computation op Arithmetic Mean prom a Frequency Distribu- 
tion BY THE Long Method por Wheat Yields on 160 Farms 

Class Intervals 
Bushels of Wheat 

Mid-points 

m 

Frequencies 

/ 

Products of Mid-points 
and Frequencies 
fm 

8.5-10.4 

9.5 

4 

38.0 

10.5-12.4 

11.5 

11 

126.5 

12.5-14.4 

13.5 

20 

270.0 

14.5-16.4 

15.5 

29 

449.5 

16.5-18.4 

17.5 

30 

525.0 

18.5-20.4 

19.5 

25 

487.5 

20.5-22.4 

21.5 

21 

451.5 

22.5-24.4 

23.5 

9 

211.5 

24.5-26.4 

25.5 

6 

153.0 

26.5-28.4 

27.5 

3 

82.5 

28.5-30.4 

29.5 

1 

29.5 

30.5-32.4 

31.5 

1 

31.5 



160 

2,856.0 


2,856.0 ^ 
^ N 160 


the product is certain to be a large number, frequently running 
into the thousands, or even millions. It is also called the long 
method in contrast with the short method illustrated in Work- 
sheet No. 16. 


ARITHMETIC MEAN BY SHORT METHOD 
Meaning of symbols : 

A = assumed mean, mid-point opposite zero (0) deviation. 
x' == deviation from an assumed mean. It shortens the process to 
take these deviations in class intervals. 
c = correction for assumed mean. 

— • 

^ N Formula No. 4 

X = A + c, or X = A + ^^i 
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This short method is likely to seem longer and more compli- 
cated to the beginner, but in making a complete statistical analysis 
of a large sample, it is a great time saver as will be clearly indicated 
in later chapters. It eliminates almost all decimal fractions and 
large totals. 

WORKSHEET NO. 16 


Computation of Arithmetic Mean by the Short 
Method for Wheat Yields on 160 Farms 
IN the Hard Winter Wheat Belt 


Class Intervals 
Bushels of Wheat 
i 

Mid-points 

m 

Frequencies 

/ 

Example 1 

Example 2 

x' 

fx' 

* x' 

fx' 

8.5-10.4 

9.5 

4 

- 4 

- 16 

- 6 

- 24 

10.5-12.4 

11.5 

11 

- 3 

- 33 

- 5 

- 55 

12.5-14.4 

13.5 

20 

- 2 

40 

- 4 

- 80 

14.5-16.4 

15.5 

1 29 

- 1 

- 29 

- 3 

- 87 

16.5-18.4 

17.5 

30 

0 


- 2 

-* 60 

18.5-20.4 

19.5 

25 

' 1 

25 

- 1 

- 25 

20.5-22.4 

21.5 

21 

2 

42 

0 


22.5-24.4 

23.5 

9 

3 

27 

1 

9 

24.5-26.4 

25.5 

6 

4 

24 

2 

12 

26.5-28.4 

27.5 

3 

5 

15 

3 

9 

28.5-30.4 

29.5 

1 

6 

6 

4 

4 

30.5-32.4 

31.5 

1 

7 

7 

5 

5 



160 


28 


- 292 


Example 1 


= 17.5 -f 


56 


160 

= 17.5 4- .35 
= 17.85 


Example 2 


X = A -h 


2 /^' . 


= 21.5 + 


= 21.5 -f 


-292 
160 ^ 
- 584 
160 


= 21.5 - 3.65 
= 17.85 


In Worksheet No. 16 the arithmetic mean is computed from 
two assumed means to prove that no matter at what point the 
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assumed mean is placed, the results will be the same. In Exam- 
ple 1, the assumed mean is 17.5, the mid-point of class 16.5-18.4. 
This is slightly smaller than the true mean. We, therefore, have 
a plus correction, + 28, in the fx' column, which signifies that 
the assumed mean must be increased by the amount of this cor- 

( 'Ztfix') 28 \ 

Q ~ 2 = .35 ), which when added to the as- 

N 160 / 

sumed mean (17.5 + .35 = 17.85) gives the correct mean. In 
Example 2, the assumed mean, 21.5, the mid-point opposite 
zero (0) in the deviation column is too large, and, therefore, gives 
a correction value of — 292 in the fx' column, which means that 
the correction must be subtracted from the assumed mean in this 
case, 

21.5 - c = 21.5 - i = 21.5 - 2 = 21.5 - 3.65 = 17.85. 

N 160 

This is identical with the mean in Example 1. The same result 
would be obtained by this method regardless of the class in which 
the zero (0) deviation is placed. In actual practice, it is better 
to place the zero (0) as near the middle of the frequency dis- 
tribution as possible. The middle location reduces the size of 
the correction figures and shortens the work. This is one of the 
most important worksheets in all statistics. It should be mas- 
tered. 

It should be observed that the mean computed by the long 
method is identical with that obtained from the short method. It 
is also worth noting that all these means obtained from frequency 
distributions are slightly different from that given in Worksheet 
No. 13 from the individual items of data. This will be the case 
in any frequency distribution of any data. In this case the dif- 
ference is small, 17.84 against 17.85 bushels, or .01. In many 
series it will be larger. The reason for this variation is that in 
frequency distributions each item of data in each class is imputed 
or assumed to be worth the mid-point of its class. The frequent 
deviation of the value of data from the class mid-point makes 
this assumption slightly incorrect. This small error, however, is 
not sufficient to make any material difference in the practical or 
theoretical results. Since the short method has so many other 
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advantages it should ordinarily be used for all samples of 100 or 
more items and may be used for much smaller samples. 

In Worksheet No. 17 the average unit cost of family units in 
two-family unit houses in ten large South Atlantic cities is found 
to be *2,577.70. 

In all these examples it should be noted that the mean is 
stated in the units of the original data. 

WORKSHEET NO. 17 


Computation of Arithmetic Mean of Building Permit Valuation 
PER Family Dwelling Unit of 2-Family Structures in 9 South 
Atlantic Cities of Over 100,000 Population, 1939* 



Short Method 



Long Method 

i 

m 

/ 

x ' 

fx ' 

fm 

$ 500- 999 

750 

6 

- 4 

- 24 

4,500 

$1000-1499 

1250 

16 

- 3 

- 48 

20,000 

$1500-1999 

1750 

140 

- 2 

- 280 

245,000 

$2000-2499 

2250 

164 

- 1 

- 164 

369,000 

$2500-2999 

2750 

124 

0 


341,000 

$3000-3499 

3250 

82 

+ 1 

•+•82 

266,500 

$3500-3999 

3750 

18 

+ 2 

■+36 

67,500 

$4000-4499 

4250 

12 

+ 3 

+•36 

51,000 

$4500-4999 

4750 

4 

+ 4 

+ 16 

19,000 

$5000-5499 

5250 

14 

+ 5 

4-70 

73,500 

$5500-5999 

5750 

12 

+ 6 

-+ 72 

69,000 



592 


- 204 

1,526,000 


* Source: U.S. Dept, of Labor Bulletin No. 689, p. 13. 


Short Method 


Z- A + 



= $2750 + 500 

= $2750 - 172.30 
= $2,577.70 


Long Method 

^ N 

1,526,000 

592 

= $2,577.70 


Worksheet No. 17 includes the long method and the short method 
for the sake of contrast and comparison. Ordinarily in com- 
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piiting a meaa one would not use both methods, and need make 
only one of the solutions, which usually would be the short one. 
The contrast is clearly shown in Worksheet No. 17, in which the 
total of the short method is a - 204, while that of the long method 
is 1,526,000. The greatest advantages of the short method are 
revealed in the next three chapters in which it is used to compute 
from two to five or six statistics from one worksheet. It is a 
means of great economy in time and labor. 


WEIGHTED ARITHMETIC MEAN 

In the computation of index numbers and in averaging ratios 
and products or other means, it is necessary to weight the items 
in the computation of the average, if a correct result is to be ob- 
tained. From one point of view all arithmetic means are weighted. 
If no special specific weight is assigned to the several items in- 
cluded, each item has a weight of one. All are weighted equally. 
In computing averages from class intervals, the frequency dis- 
tribution may be considered as a series of weights for the various 
mid-points. In many cases, however, specific and varying weights 
are assigned to the several items. For instance, in averaging the 
wages of different classes of workmen such as masons, carpenters, 
plasterers, painters, decorators, etc., drawing different rates of 
wages per hour, it would be necessary to weight the various 
wage rates with the number of persons receiving the several 
wages, or the number employed in each kind of work, in order to 
determine the correct average wage payment. If common labor 
received $.30 per hour and skilled labor received $1.20 per hour, 
the unweighted average would be $.30 + $1.20 = $1.50 2 = $.75 

per hour. But if there are four common laborers for each skilled 
worker, the true weighted average wage -would be: 

4 X $.30 + $1.20 = $2.40 5 = $.48 per hour. 

Meaning of symbols : 

X = individual item of data. 

W = weight, or number of times an item of data is counted. 
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Formula No. 5 
Aw 


WORKSHEET NO. 18 

Weighted Average Wage Rate of 60 
Building Trades Woricers 


Kind of Work 

Hourly Wage Rates 

X 

Weights 

Number Employed 

W 

WX 

Painters 

$1.25 

4 

$ 5.00 

Plasterers 

1.50 

3 

4.50 

Carpenters 

1.00 

8 

8.00 

Helpers 

.50 

15 

7.50 

Laborers 

.30 

30 

9.00 



60 

34.00 


2WZ $34.00 
^ ” 60 


$0,567 


The weighted average is 56.7 cents per hour. The reason that 
the weighted mean in this case is so much below the high wages 
of painters, plasterers, and carpenters is that there are so few of 
the highly paid workers and so many common laborers. If the 
weights were reversed and there were a larger number of skilled 
workers the weighted mean would be larger. This average is 
illustrated more fully in Chapter 19 on the making of the index 
numbers. In such computations a weighted mean is essential. 


AVERAGING A SERIES OF MEANS 

It is frequently desired to average a series of averages. In such 
cases it is necessary to weight each average with the number of 
items included in the computation of that separate mean. This 
may be illustrated by the average of total retail sales per store 
by counties in New Hampshire in 1939. 
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WORKSHEET NO. 19 


COMPAEISON OF WEIGHTED AND UNWEIGHTED ReTAIL SaLES 

BT Counties for New Hampshire, 1939 


County 

Average Eetail 
Sales per Store 
per County 

Weights 

No. of Stores 
per County 

Weighted 

County 

Averages 

Belknap 

124,929 

379 

$9,448,000 

Carroll 

14,952 

357 

5,338,000 

Cheshire 

25,613 

455 

11,654,000 

Coas 

22,187 

562 

12,469,000 

Grafton 

24,729 

716 

17,706,000 

Hillsborough 

27,922 

2,091 

58,385,000 

Merrimac 

26,088 

829 

21,627,000 

Eockingham 

20,079 

1,070 

22,250,000 

Stafford 

26,060 

661 

17,226,000 

Sullivan 

28,562 

315 

8,997,000 


$241,121 

7,435 

$185,100,000 


_ SZ $241,121 Unweighted average of 

A — = — = .. . 

N 10 * averages (incorrect) 

The weighted mean of average county retail sales for New Hamp- 
shire is: 


_ 2WX $185,100,000 
SIT 7,435 


$24,896. 


Weighted Mean = $24,896 
Unweighted Mean = 24,112 

Error of Unweighted Mean $ 784 


To give the correct average of the average sales by counties for 
retail stores it is necessary to weight the county average sales by 
the number of retail stores per county. In fact this method of 
weighting amounts to changing average county sales back to the 
original total county sales. The weighted mean will differ from 
the unweighted mean in proportion to the relative frequency of 
the extremely large or extremely small items in the series. 
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SUMMARY OF ARITHMETIC MEAN 

1. The arithmetic mean may be computed from the original 
unorganized data by dividing the sum of the items by their number. 

2. It may be computed from data organized into class intervals 
by multiplying the mid-point of each class by the class frequency, 
summing these products for all classes and dividing this sum by 
the number of items. This process is called the long method. 

3. It may be computed by assuming the mid-point of some 
class to be the mean and then computing the algebraic sum of the 
class products of the class deviations times the class frequency, 
dividing the sum by N and adding algebraically this correction 
sum to the assumed mean. This is called the short method because 
it saves time and labor. 

4. The weighted arithmetic mean is computed by multiplying 
each item, or each mid-point, by a number, called a weight, which 
includes the item in the sum as many times as there are units in 
the weight, and dividing the summed weighted items by the sum 
of the weights. 

5. The arithmetic mean is a complete measure of central tend- 
ency because it includes both the number and the size of the 
averaged items. 

6. It is the point which exactly balances the frequency dis- 
tribution of absolute numbers, so that there is equal weight 
(number X size) on either side of it. 

7. It is sound algebraically and may be easily manipulated in 
equations. 

8. It is the most useful and widely used average in statistics 
for normal distributions of absolute numbers. 


GEOMETRIC MEAN 

The geometric mean is the nth root of the product of n factors. 
It is useful for averaging many kinds of ratios and percentage 
changes. It is especially useful in making price relative index 
numbers. If we wish to average the percentage change in the 
price of cotton when it falls from $.10 a pound to $.05 a pound, 
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and later rises to $.10 a pound, only the geometric mean will give 
US the correct answer. 


Price of Cotton 


Average of Price Relatives 
by Arithmetic Mean 



% 

1931 $.10 

50, fell to 50% of 1931 price 

1932 

.05 

200, rose to 200% of 1932 price 

1933 

.10 

= 125% change (incorrect) 


Average of Price Relatives by Geometric Mean 
”^50 X 200 = ”^10,000 = 100% (correct) 

It is quite evident that if the price of cotton was $.10 in 1931 
and $.10 in 1933, that the latter price is 100% of the former price, 
even if there had been a faU and a rise in the meantime. The 
arithmetic mean makes the 1933 price of $.10 125% of the 1931 
price of $.10, which is clearly incorrect. The geometric mean 
alone will follow through these price changes with the correct 
result. In the vast economic field of price changes, its use is 
necessary and satisfactory. Many index numbers are computed 
by the use of the geometric mean. 

When the data are manipulated by single items and not by 
class intervals, the formula is: 

Formula No. 6 

(?= ■ ■ x~ 

The geometric mean of 2 and 8 is 4. '^2X8 = 'V^16 = 4. That 
of 3, 6, and 12 is 6. -^3 X 6 X 12 = = 6. The difficulty 

of extracting high roots is obviated by the use of logarithms. 

Formula No. 7 

Log G = ^ X 2 + • ■ ° + Log Xn) 

or abbreviated, 

Logo. 5®^) 
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WORKSHEET NO. 20 


X 

LogX 

104 

2,01703 

95 

1.97772 

112 

2.04922 

97 

1.98677 

82 

1.91381 

9.94455 


LogCr = 


2 (Log Z) 
N 


9.94455 

5 

= 1.98891 

G = antilog of 1.98891 
G = 97.48 


For the weighted geometric mean so widely used in price index 
numbers and percentage changes, the formula is: 


Weighted Geometric Mean 
Formula No. 8 

T _ 2(1Fi Log Xi + TFa Log X 2 + • • • + W„ Log X„) 
Log G-wr 


WORKSHEET NO. 21 


X 

LogX 

w 

l^LogX 

104 

2.01703 

12 

24.20436 

95 

1.97772 

7 

13.84404 

112 

2.04922 

10 

20.49220 

97 

1.98677 

2 

3.97354 

82 

1.91381 

1 

1.91381 



32 

64.42795 


„ S(WLogX) 64.42795 
ZW 32 


koi3373 


Gw = antilog of 2.013373, or 103.13 


The reason the weighted geometric mean in this particular 
case is so much larger than the unweighted geometric mean from 
the same data is that the larger numbers bear the heavy weights, 
104 is weighted with 12, and 112 with 10, while 97 has a weight 
of 2 and 82 only 1 . If the heavy weights had been placed on the 
small numbers, the results would have been reversed. 
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Geometric Mean with Class Intervals 

Formula No. 9 

T ^ 2/ Log m 

Log(?= / 

WORKSHEET NO. 22 

Computation op Geometric Mean of Price Relatives of 485 
Commodities Measuring Price Changes from 1913 to 1918 

Price 

Relatives 

i 

Mid- 

points 

m 

Fre- 

quencies 

/ 

Logarithm of 
Mid-point 

Log m 

Frequencies X 
Logarithm of 
Mid-points 
/Log m 

50- 69.9 

60 

2 ' 

1.778151 

3.556302 

70- 89.9 

80 

4 

1.903090 

7.612360 

90- 109.9 

100 

3 

2.000000 

6.000000 

110- 129.9 

120 

23 

2.079181 

47.821163 

130- 149.9 

140 

28 

2.146126 

60.091528 

150- 169.9 

160 

78 

2.204120 

171.921360 

170- 189.9 

180 

92 

2.255273 

207.485116 

190- 209.9 

200 

62 

2.301030 

142.663860 

210- 229.9 

220 

47 

2.342423 

110.093881 

230- 249.9 

240 

29 

2.380211 

69.026119 

250- 269.9 

260 

30 

2.414973 

72.449190 

270- 289.9 

280 

24 

2.447158 

58.731792 

290- 309.9 

300 

26 

2.477121 

64.405146 

310- 329.9 

320 

17 

2.505150 

42.597550 

330- 349.9 

340 

7 

2.531479 

17.720353 

350- 369.9 

360 

0 

2.556303 


370- 389.9 

380 

2 

2.579784 

5.159568 

390- 409.9 

400 

1 

2.602060 

2.602060 

410- 429.9 

420 

3 

2.623249 

7.869747 

430- 449.9 

440 

0 

2.643453 


450- 469.9 

460 

1 

2.662758 

2.662758 

470- 489.9 

480 

2 

2,681241 

5.362482 

770- 789.9 

780 

1 

2.892095 

2.892095 

870- 889.9 

880 

1 

2.944483 

2.944483 

1070-1089.9 

1080 

1 

3.033424 

3.033424 

2130-2149.9 

2140 

1 

3.330414 

3.330414 



485 


1118.032751 


Source: Bureau of Labor Statistics, Wholesale Prices 1890-1925, 

Bulletin No. If.16. 
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Lo80-2^ 

1118.032751 

485 

- 2.3052221 
Antilogarithm = 201.94 

The arithmetic mean for these data would be 216.54 which 
would be an incorrect average for these price relative changes. 
Since no price relative can fall below zero (0) but may rise to 
unlimited heights, the correct middle balance between these op- 
posite movements can be computed only by a relative or geometric 
average. In this case the price relatives range between 50 and 
2150. The absolute lower limit is 0, the upper limit is infinity. 
Prices cannot fall more than 100% but may rise hundreds of 
thousands or even millions of percent. To strike a correct bal- 
ance between these converse ratios only the nth root of their 
products will suffice. This method reduces the large upward 
moving percentage changes to ratios equal to the smaller falling 
percentage changes. The arithmetic mean overemphasizes the 
large upward moving percentages by considering them as ab- 
solute numbers instead of ratios or relative numbers which they are. 

CONTRAST OF GEOMETRIC AND ARITHMETIC MEANS 

The relationship and contrast between the arithmetic mean and 
the geometric mean may be further illustrated by the following 
example. 

Substitution Contrast 

Arithmetic Mean Geometric Mean 


2 

12.4 

2X4X8X 16 X 32 = 32,768 

4 

12.4 

5th root of 32,768 = 8 

8 

12.4 

8X8X8X8X8 = 32,768 

16 

12.4 


32 

12.4 


62 

62 



62 


X = — = 

™ = 12.4 


N 

5 
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1. The arithmetic mean of a The geometric mean of a series 
series of items may be sub- of items may be substituted for 
stituted for each item in the each item in the series and give 
series and give the same sum. the same 'product. 

Deviation and Balance Contrast 

2 - 12.4 = - 10.4 IXfxfX^X^ 

4 - 12.4 - ~8.4 4x2 = 2X4 

8 - 12,4 = - 4.4 8 = 8 

- 23-2 

16 - 12.4 = -f 3.6 
32 - 12.4 = -f 19.6 

23-2 

0 

2. The deviations above and The product of the ratios above 
below an arithmetic mean and below a geometric mean 
are equal and total zero (0). are equal. 

Absolute vs. Relative Contrast 

3. The arithmetic mean should The geometric mean should be 

be used to average absolute used to average ratios, percen- 
numbers. tages, or relative numbers. 

Simple and Compound Interest Contrast 

4. The accumulation of compound interest is another relationship 
which can be measured correctly only by the method of the geo- 
metric mean. 

Symbols 

Po = amount of principal at beginning of period 
Pn = amount of principal at end of period 
r = rate of interest 
N = number of years in the period. 

If the interest is compounded annually the equation is: 

Pn = Po (1 + rY 
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If we solve for r we have: 

/I I ."/Pn ^ 

^^=(1+^)" or V^=(l + ^) 

"/Pra , 


or 


At what percent would $500 double in 10 years? 




$1000 


~ 1 = v^2 - 1 


The logarithm of 2 is .30103. The logarithm of the 10th root of 2 
^01 

is — or .030103. The antilogarithm of .030103 is 1.0718. 


Therefore r = -<72 - 1 = 1.0718 - 1 = .0718. 

The rate of interest at which $500 would become $1,000 in ten 
years compounded annually is 7.18%, 


Simple or Arithmetic Interest Rate 

$500 Principal 
.0718 Rate 
4000 
500 
3500 

$39.90 interest for one year 

10 years 

$359.00 = simple interest for 10 years 
$500.00 = compound interest for 10 years. 


HARMONIC MEAN 

The harmonic mean is based on the reciprocals of the numbers 
averaged. It is the reciprocal of the arithmetic mean of the 
reciprocals of the numbers averaged. It is restricted in its field 
of usefulness, but is necessary in averaging different rates of 
speed for equal given distances. One is no doubt familiar with the 
old problem in eighth-grade arithmetic, ^^If James can dig a 
ditch in two days, and John can dig it in three days, how long 
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will it take the two boys working together to dig it?'^ Such 
problems are based on the principle of the harmonic mean. 

The following illustration will indicate the assumption which 
conditions its proper use. Two boys A and B ride bicycles. A 
travels 10 miles per hour. B travels 15 miles per hour. The two 
elements of the problem are time (T) and distance (D). Assump- 
tion Ij Both hoys ride for two hours. The speeds are weighted 
with equal periods of time. This assumption requires the arith- 
metic mean. The solution is as follows: 

A travels 10 miles per hour 
B travels 15 miles per hour 
= 12.5 miles per hour, 

or in two hours A travels 20 miles and B travels 30 miles. The 
total time traveled is 4 hours, the total distance is 50 miles. 

= 12.5 miles per hour. For the assumption of equal time 
periods the arithmetic mean is required. 

Assumption II, Both boys ride SO miles. The speeds are weighted 
in this case with equal distances. This assumption requires the 
harmonic mean. The solution is as follows: 

A travels 30 miles in 3 hours 
B travels 30 miles in 2 hours 

Together they travel 60 miles in 5 hours. = 12 miles per hour, 
or 

A travels one mile in of an hour 
B travels one mile in of an hour 

^^2 5 _ 

10 W 15 — 30 * 1 ~ 60* 

They average a mile in ^ of an hour or 12 miles per hour. 

In averaging time rates when the various rates of speed are 
weighted by equal distances only the harmonic mean gives the 
correct result.^ 

During World War I, the United States Government statis- 
ticians had to use this mean to average the rates of speed of ships 

^ See ‘'The Nature and Use of the Harmonic Mean,'^ by Wirth F. Ferger, 
Journal of the American Statistical Association, March, 1931 , x>. 36 . 
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carrying supplies to our armies in France. In World War II, we 
again find it necessary to measure the average load of supplies 
we can maintain in Ireland, Egypt, Australia, Alaska, and the 
East Indies. It is useful in averaging the supplies of iron ore 
that can be kept moving on our Great Lakes. Modern industry 
involves so much of speeds in work and transportation that the 
harmonic mean has a relatively more important place in present- 
day business than in the past. 

Formula No. 10, used with individual items 

1 ^ ixi'^ yA " 

H N 

Formula No. 11 , used with class intervals 



H N 

in which — is the reciprocal of the mid-point of the class interval. 


WORKSHEET NO. 23. 

Computation of Harmonic Mean of Speed of Planets Around 
THE Sun Showing Difference Between Average Speed 
OF Minor Planets and Major Planets 


Planets 

Orbital Velocity 
Miles per Second 

Reciprocal of 

Orbital Velocity 

Minor Planets 

Mercury 

24.0 

.041666667 

Venus 

21.9 

.045662100 

Earth 

18.5 

.054054054 

Mars 

15.0 

.066666667 



.208050488 

Major Planets 

Jupiter 

8.1 

.123456790 

Saturn 

6.0 

.166666667 

Uranus 

4.2 

.238095240 

Neptune 

3.8 

.263157890 


.791376587 
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Mean Reciprocal for Minor Planets = .062012622 


Harmonic Mean of Minor Planet’s Velocity 

= 19.22 miles per second. 
Mean Reciprocal for Major Planets 


.791376587 

4 


= .197844146 


Harmonic Mean of Major Planet’s Velocity 
797^4146 “ 

Harmonic Mean Velocity of Minor Planets 19.22 
Harmonic Mean Velocity of Major Planets 5.1 

Ratio of Mean Velocities 3.77 : 1.0 


3.77 


Contract With Arithmetic Mean 

The arithmetic mean of the velocity of the minor planets is 
19.85 miles per second. 

The arithmetic mean of the velocity of the major planets is 
5.52 miles per second. 

The ratio between the two mean speeds is = 3.58 as com- 
pared with 3.77 for the harmonic mean. 


Harmonic Mean of Speed of Santa F6 Trains 
Mean Reciprocal of speeds of 6 fastest special service trains is 
.81559717 


6 


= .135932862 


Harmonic Mean of speeds of 6 fastest special service trains is 

f -7 oc . [ Kansas City 

.13932862 | to Chicago 

Mean Reciprocal of speeds of 14 slower accommodation trains is 
1.33588490 

= .095420366 


14 
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Harmonic Mean of speeds of 14 slower accommodation trains is 


.095420356 


10.48 hours. 


Ratio of mean schedules = 1.42 


Kansas City 
to Chicago 

1.42 : LO or 142 : 100 


The 6 fastest special service trains on an average move one and 
one-half times as fast as the 14 slower accommodation trains. 

Santa Fe Trains Arithmetic Mean 

Six fast trains 7.37 hours 

Fourteen slower trains 10.53 hours 


WORKSHEET NO. 24 


Computation of Haemonic Mean of Speed of Atchi- 
son, Topeka and Santa Fe Passengee Teains East 
AND West feom Chicago to Kansas City, Mo. 


Train 

Schedule Time 

Reciprocals of 

Number 

in Hours 

Train Time 

1 

6.9 

.14492754 

2 

7.1 

.14084507 

3 

7.5 

.13333333 

4 

7,5 

.13333333 

5 

7.6 

.13157895 

6 

7.6 

.13157895 



.81559717 

7 

9.4 

.10638298 

8 

9.5 

.10526316 

9 

9.5 

.10526316 

10 

9,8 

.10204082 

11 

10.3 

.09708738 

12 

10.5 

.09523810 

13 

10.6 

.09433962 

14 

10.8 

.09259259 

15 

10.8 

.09259259 

16 

10.9 

.09174312 

17 

11.1 

.09009009 

18 

11.2 

.08928571 

19 

11.3 

.08849558 

20 

11.7 

.08547009 


1.33588499 
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MEDIAN 

The median for a discrete series may be defined roughly as the 
middle item of a series or an array of data. For a continuous 
series, it is that point in the range of the data that divides the 
data into two equal parts so that there are just as many items on 
one side of it as on the other. As the name implies, the median 
is the middle of the data. It is a less complete measure than the 
mean. It takes into consideration only the number of the items, 
disregarding their size completely. The arithmetic mean of the 
seven items, 5, 6, 6, 7, 9, 12 and 200 = 35. The median of these 
same seven items is the middle item, or 7. The large item, 200, 
has no more effect on the median than the smallest item of 5. 

The steps necessary to locate the median in unorganized data 
without class intervals are as follows: 

1. Arrange the data in an array from the smallest item to the 
largest. This method is revealed in Worksheet No. 2. 

2. Select the middle item, or if there is an even number, the 
simple mean of the two middle items. 


Median from Ungrouped Data 
Data: 7, 12, 8, 5, 9, 10, 9 
Array: 5, 7, 8, 9, 9, 10, 12 


Middle item = median = 9. 


Median from Data in Class Intervals 
Foemula No. 12 

Me ^L + '^i , ^ = one-half of items, the number on 
f 2 each side of the median. 

Meaning of symbols: 

Me = median 

L = lower limit of class interval in which median falls 
/ = frequency of median class interval 

p = part of median class frequency used to make up half of items 
N = total number of items in sample 
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WORKSHEET NO. 25 

Computation of Median, Quartiles, Quintiles, Deciles, 
AND Percentiles from Frequency Distribution 
OP Wheat Yields on 160 Farms 


Class Intervals Frequencies Cumulated 

Bushels of Wheat Frequencies 


i 

f 

Forward 

Backward 

8.5-10.4 

4 

4 

160 

10.5-12.4 

11 

15 

156 

12.5-14.4 

20 

35 

145 

14.5-16.4 

29 

64 

125 

16.5-18.4 

30 

94 

96 

18.5-20.4 

25 

119 

66 

20.5-22.4 

21 

140 

41 

22.5-24.4 

9 

149 

20 

24.5-26.4 

6 

155 

11 

26.5-28.4 

3 

158 

5 

28.5-30.4 

1 

159 

2 

30.5-32.4 

1 

160 

1 

160 


80 = one-half of items 
^ = sum of frequencies up to 
median class 

median 16 = p = part of median class fre- 
quency used to make ^N. 

- 16.5 
= 16.5 + 

= 16.5 d- 1.07 
= 17.57 median (bushels) 

The median in this continuous series of data is that point on the 
range of 8.5 to 32.4 which divided the 160 items into two groups 
of 80 on each side of the median point. The method of computing 
it from class intervals is as follows: 

1. Divide W by 2 to determine the number of items on each 
side of the median point. 



Lower limit of 
class = 16.5 

Me 
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2. Accumulate the frequencies from the top of the frequency 
column up to but not including the median frequency, as in this 
case, 4 + 11 + 20 + 29 = 64. At this point the median has not 
been reached, but when the next frequency is added the median 
has been passed. The median, therefore, in this case is in the 
class, 16.5-20.4, and the frequency in which the median is lo- 
cated is 30. 

3. Subtract 64 from 80. This leaves 16, or 80 — 64, to be 
taken out of the frequency of the median class of 30. 

4. Up to 16.5 on the range scale there are a total of 64 items. 
To this value, 16.5, must be added of the width of the median 
class, or of 2, which is ^ 2, or |§, or 1.07. 

5. 16.5, the lower limit of the median class, plus 1.07 = 17.57, 
the median point on the range scale or X-axis. 


First and Third Quartiles 


Foemula No. 13 


Qi = L +ji 




= 14.5 + A2 


== 14.5 -h •§§ 


= 14.5 + .34 
= 14.84 Qi (bushels) 


40 = one-fourth of items 
^ = sum of frequencies up to first 
quartile class 

5 = p = part of first quartile class fre- 
quency used to make iN, 
lower limit of first quartile 
class — 14.5 


Formula No. 14 


Q3 = L + §i , ^ = 7ofl60 = 120 
/ 4 4 


= 20.5 + ^ 2 


- 20.5 + ij 
= 20,5 + .095 
= 20.6 Qs (bushels) 


120 == three-fourths of items 
119 == sum of frequencies up to third 
quartile class 

1 - p = part of third quartile class 
used to make fX 

lower limit of third quartile class = 20.5 
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Quintiles 


Formula No. 15 


12.5 + 2 


Qui = L+ji 


= 32 


= 12.5 + M 

= 12.5+1.7 
= 14.2 Qui (bushels) 


N ^ 160 
5 5 

32 = one-fifth of items 

= sum of frequencies up to first 
quintile class 

17 = = part of first quintile class 

used to make 
lower limit of first quintile 
class = 12.5 


The second, third, and fourth quintiles may be computed from 

the same general formula with the exception that the second quin- 

2N 3N 4iV 

tile requires -=-? the third —j and the fourth for the division 
0 0 0 

of the sample items. Since the quintiles divide the sample in five 

sections it gives a more detailed or minute analysis of the data 

than does the quartile. They are frequently used in reducing 

educational and social data. 


Sextiles, Septiles, Octiles, Deciles, and Percentiles 

A frequency distribution may be divided into as many sections 
or segments as the type of analysis desired requires. Ail of the 
above measures rest on the same general formula as the median, 

Me = L + ^ ^. They are : 

^ V N 

Sextiles, Sex. = L + ^ t — 

Septiles, Sep. = L + ^ i, y 

Octiles, Oct. = L + ^ t, y 

Deciles, Dec. = L + ^ i, ^ 

Percentile, P. = L + ^ 
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Deciles, which divide the data into tenths, and percentiles, 
which split a series into hundredths, are more widely used than 
the others. They are frequently employed in making a detailed 
analysis of sales distributions or in population and educational 
and psychological analyses and measuring scales. For wheat 
yields on the 160 farms these measures are: 

^ , 11.67^ 

Sexi = 12.5 + 2 

= 12.5 + 1.167 
== 13.67 

10 7 

Sep2 = 14.5 -f 2 

- 14.5 + .74 
= 15.24 

Gets = 14.5 + if 2 
= 14.5 + 1.72 
= 16.22 

D7 = 18.5 + i|2 
= 18.5 + 1.44 
= 19.94 

P„-I4.5 + ‘^2 

= 14.5 + 1.23 
= 15.73 

Percentiles give a very minute division of a frequency distribu- 
tion and usually are used to distribute the items of a relatively 
large sample either in educational test grades, sales budget quotas, 
or social or economic measuring scales. Percentiles serve as a 
super measuring scale for samples of 500 to 1000 items or more. 

The median and all the related smaller measures are purely 
locational values and entirely exclude the size of the several in- 
dividual sample items. They are, therefore, all less complete 
measures than the means. 


6 ~ 6 


= 26.67 


2N 320 


45.7 


^ 480 
8 8 


= 60.0 


IK 1^20 

10 10 


= 112 


33 X 160 
100 


= 52.8 
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Graphic Method of Computing Median, Quartiles, Deciles, 
Percentiles, and Other Locational Measures 



Fig. 44. Graphic method of computing locational measures 

Method of Constructing Median, Quartile, Decile, Percentile 

Chart 

1; Set the class intervals on the X-axis, and the frequencies on 
the F-axis. 

2. Plot the accumulated frequencies in column 3, Worksheet 
No. 25, so that the accumulated frequency in each case falls at 
the right-hand end of the appropriate class interval. Four is 
plotted above 10.4, 15 is plotted above 12.4, 35 above 14.4, 64 
above 16.4, and so on to the last figure, or total, 160, which is 
plotted at the end of the last class interval, or above 32.4. 




200 


AVERAGES 


3. The median, quartiles, deciles, and percentiles are located 
on the X-axis as follows: 

N 

a. For the median, follow up the F-axis from zero to or 

80 in this case. Place a square so that the point touches the 
diagonal frequency curve line and the top edge of the horizontal 
arm touches 80, the median frequency on the F-axis. In this 
position the right edge of the vertical arm will cross the X-axis at 
the location of the median. 

b. For the first quartile, move the square down toward zero 
on the F-axis, being careful to keep the point on the frequency 
curve line. When the upper edge of the horizontal arm crosses 

N 

the F-axis at ^ ? the vertical arm will cross the X-axis at the point 
of the first quartile. 

c. To locate the third quartile, the square is moved up and to 
the right, keeping the point on the frequency curve, until it 

3X 

reaches — on the F-axis. At this location the vertical bar will 

cross the X-axis at the point of the third quartile. 

d. Any decile, percentile or other ratio may be located quickly 
by this method. When a frequency distribution is to be divided 
into many parts, this is an easy and sufficiently accurate method 
for doing it. 


MODE 

The mode is the least exact and least stable of all the averages. 
Its primary use is in quickly pointing out the value of the most 
numerous group of items. While the median takes into consid- 
eration the entire number of items, the mode considers only the 
most numerous group. It is especially unstable because a slight 
change in the sample may shift the mode a considerable distance 
as indicated below: 

Mode in Ungrouped Data 
Data: 7, 12, 8, 5, 9, 10, 9 
Array: 5, 7, 8, 9, 9, 10, 12 



MODE 


201 


The mode is 9, the most numerous group. 

If this sample were changed to 5, 5, 7, 8, 9, 10, 12, the mode 
would be 5. A shift of one item moved the mode from 9 to 5, 
decreasing its size almost 50 percent. If the distribution had 
been 5, 5, 7, 8, 9, 9, 10, 12, there would have been two modes a 
considerable distance apart, both contending for recognition. 
Such bi-modal or even tri-modal distributions are frequent, and 
whenever they occur invalidate the use of the mode. For this 
reason the mode is employed only as a rough measure of the con- 
centration of the data. 

Mode by Formula from Class Intervals 
Symbols for interpolation formula: 

Mo ^ mode 

/i = frequency of class interval next smaller in size than 
the modal class 

/2 = frequency of class interval next larger in size than 
the modal class 


WORKSHEET NO. 26 


COMPUTATON OF MODE BY INTERPOLATION FORMULA 

OF Wheat Yields on 160 Farms 


Class Intervals 
Bushels of Wheat 
i 

Frequencies 

/ 

8.5-10.4 

4 

10.5-12.4 

11 

12.5-14.4 

20 

14.5-16.4 

29 =/i 

16.5-18.4 

30 

18.5-20.4 

25 = A 

20.5-22.4 

21 

22.5-24.4 

9 

24.5-26.4 

6 

26.5-28.4 

3 

28.5-30.4 

1 

30.5-32.4 

1 

Total 

160 


By inspection or observation the 
mode is the mid-point of the modal 
class, which is the class with the 
largest frequency. In this case that 
is the mid-point of the class 16.5- 
18.4, or 17.5. 
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Formula No. 16 


Mo =L- 


h 


fi+fi 


25 


25 + 29 ' 
= 16.5 + .93 
= 17.43 (bushels) 


Mode by Locational Formula 
Formula No. 17 

Mo^X-Z{X- Me) 

- 17.85 - 3(17.85 - 17.57) 

= 17.85 - 3 X (.28) 

= 17.85 ~ .84 
= 17.01 (bushels) 

The mode by the interpolation formula is 17.43 and by the lo- 
cational formula is only 17.01. The difference is caused by the 
fact that the interpolation formula considers only the two fre- 
quencies joining the mode while the locational formula, based on 
the mean and median, considers all the frequencies. Located by 
any method, the mode is an unstable and incomplete measure 
of central tendency. 

The locational formula is based on the fact that in a perfectly 
smooth or normal frequency distribution such as Fig. 47A, shown 
on page 223, all three measures, mean, median, and mode, fall 
at the same point, but in a skewed distribution such as either 
No. B or C in Fig. 47 on page 223 the three measures tend to fall 
apart. In such an unequal or skewed distribution, the mode 
still falls under the largest frequency, but the mean is pulled 
toward the tail, or long narrow end of the curve, while the median 
falls between them nearer the mean, about one-third the distance 
from the mean toward the mode. 

The mode may be located graphically as follows: 




GENERAL SUMMARY OF AVERAGES 


20S 


Frequencies 



Fig. 45. Graphic method of locating mode 


GENERAL SUMMARY OF AVERAGES 

Averages, or measures of central tendency, as they are often called, 
are quite different in methods of computation, purpose of use and charac- 
teristics and limitations. The following characterization of each one 
should be carefully studied by the student. 

1. Arithmetic Mean 

(a) The arithmetic mean is the most widely used and the most useful 
measure of central tendency in statistics. 

(b) The mean is the most complete and adequate average, because it 
takes into consideration both the total number of items and their size. 
It is the point of balance in the data. 

(c) It is a computed or mathematical measure and can be adequately 
manipulated in algebraic equations. 

(d) Its greatest weakness is that in extremely skewed distributions it 
tends to be pulled too far toward the extreme items, out of the center 
of the distribution, and ceases accurately to represent the large body of 
the data. In such cases the median or even the mode is more representa- 
tive of the data. 

2. Geometric Mean 

(a) The geometric mean has all the strong points and weaknesses of 
the arithmetic mean, and in addition, 

(b) The geometric mean is used to average ratio or percentage changes. 
It is, therefore, of great value in making averages of price relative index 




204 


AVERAGES 


numbers. It gives less emphasis to extreme items than does the arith- 
metic mean. 

(c) By the use of logarithms it is quite easy to compute and may be 
manipulated algebraically. 

(d) It cannot be computed by logarithms for minus numbers or series 
containing zero. 

3. Harmonic Mean 

(a) The harmonic mean is more limited in its field of usefulness than 
the geometric mean, but in the domain of time, rate changes, or speeds, 
it is essential in order to obtain correct results. 

(b) The harmonic mean gives still less emphasis to extreme items than 
does the geometric mean. 

(c) It is easy to compute by means of reciprocals, but unless the re- 
ciprocals are carried out to at least seven or eight places, some inaccuracy 
will result. 

4. Median 

(a) The median is an incomplete average, because it takes into con- 
sideration only the number of items and not their size. In locating the 
median the smallest item of data counts for just as much as the largest 
one. It is based on a counting of noses and not on a weighing of minds 
or bodies. 

(b) Since it falls in the middle of the data, in extremely skewed dis- 
tributions, it is more representative of the large body of the data than is 
the mean. 

(c) It is a locational and not a computed average and cannot be manipu- 
lated algebraically. 

5. Mode 

(a) The mode is the least complete and least dependable of all averages. 

(b) It is a locational and not a computed average and cannot be 
manipulated algebraically. 

(c) It is primarily useful only for representing the large central group, 
or most numerous frequency of the data. 

(d) In some distributions there may be two or more modes, bi-modal 
or tri-modal centers of equal concentration, which make the mode largely 
useless in such cases. 

(e) In extremely skewed distributions it is usually too near one end 
of the data to be very representative. 


Selecting an Average 

In analyzing data, the statistician should first study the nature of his 
data carefully and the use he wishes to make of it, and then select that 
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TABLE 16 

SuMMAKY Table op All Measukes Computed ok the Wheat 
Yields of 160 Farms 

Measure Formula Results 


Arithmetic Mean, XJngrouped Data 

Arithmetic Mean, Class Intervals 
Long Method 

Arithmetic Mean, Class Intervals 
Short Method 

Median, Class Intervals 

First Quartile 

Third Quartile 

Mode Interpolation 
Mode Locational 


AT 

17.84 

11 

17.85 

X = A+^i 

17.85 

Me ^ L + ji 

17.57 


15.0 

Qs == L + 

20.6 


17,43 

Mo = Z - 3(X - Me) 

17.01 


average which will give the best results as to accuracy of representation 
and needs for further manipulation. Any technician, to get good results, 
must be well acquainted with the material on which he works. It is not 
alone sufficient to be an expert with tools. One must know the field in 
which those tools are to be applied. A cabinet maker, to succeed well, 
must know the nature and structure of the various woods with which 
he must work as well as know the exact use of his tools. Statistics is 
a method of quantitative analysis. The statistician is a technician. To 
compute an adequate, a good, average, or other statistic, he must be 
well informed in the field of data in which he works. A biologist who 
knows statistics can do much better statistical work in the field of biology 
than a statistician who is not acquainted with that science. The same 
principle holds for chemistry, economics, sociology, psychology, business 
administration, and all other fields of knowledge. A good research man 
must know both his statistical tools and the field of data in which he 
works. Of all statistical tools, the average is the most basic, essential, 
and fundamental. 
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REVIEW QUESTIONS 

1 . Give three reasons for using averages. 

2. Write four formulas for computing the arithmetic mean, and ex- 
plain the differences between them. 

3. Are all arithmetic means weighted or not? Why? 

4. Explain the difference between the geometric mean and the arith- 
metic mean. 

5. Explain why and how the geometric mean is adapted to averaging 
percentage changes. 

6. What is the principal use of the harmonic mean and why does it 
meet this requirement? 

7. What are the differences betw^een the harmonic and geometric 
means and the methods of computing them? 

8. What are the relative sizes of the arithmetic, harmonic, and 
geometric means? 

9. Which gives the more complete measure of data, the median or 
the mean? Why? 

10. What is the formula for the second quartile? How does it differ 
from the first and third quartiles? 

11. Explain the use and defects of the mode. 

12. For what kind of sample might the median be a more truly 
representative measure than the mean? Why? 

13. What is a decile? What is a percentile? 
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14. What is a logarithm, an antilog, a reciprocal, a frequency? 

15. On what bases does one decide what particular average to use for 
any specific sample? 

16. Of what use is an average in research work? 


EXERCISES 

1. Compute averages for any or all of the samples given at the end 
of Chapter 6. 

2. Compute averages for Per Capita Income by counties for Alabama, 
1929. (Source: Income in Counties of Alabama, 1929-19S6, by W. M. 
Adamson) 

S201, 237, 172, 219, 195, 161, 187, 358, 247, 193, 195, 199, 194, 164, 151, 

163, 287, 157, 160, 238, 195, 220, 182, 271, 201, 207, 222, 389, 232, 226, 

178, 178, 180, 190, 249, 172, 616, 189, 252, 187, 237, 213, 155, 168, 274, 

221, 199, 224, 454, 182, 481, 315, 190, 191, 195, 187, 156, 244, 243, 191, 

250, 275, 338, 276, 189, 170, 160. 

3. Compute averages for the following New Hampshire data by counties 
from United States Market Data Handbook, 1929 


County 

Total 

Population 
Urban Rural 

Bank 

Deposits 

($1,000) 

Retail 

Stores 

Number of 
Automobiles 

1 

21,178 

10,897 

10,281 

11,573 

396 

4,419 

2 

15,017 

3,102 

11,915 

4,234 

308 

3,238 

3 

30,975 

13,763 

17,212 

12,684 

509 

6,573 

4 

36,093 

24,224 

11,869 

15,693 

544 

5,591 

5 

40,572 

13,807 

26,765 

18,568 

682 

9,035 

6 

135,512 

113,161 

22,351 

99,892 

1,974 

20,476 

7 

51,770 

31,048 

20,722 

45,813 

740 

9,995 

8 

52,498 

26,736 

25,762 

21,897 

967 

10,332 

9 

38,546 

29,390 

9,156 

28,406 

636 

6,179 

10 

20,922 

13,633 

7,289 

7,946 

325 

4,267 




CHAPTER 10 


DISPERSION, SKEWNESS, AND 
VARIATION 


We have now learned to compute an average, a representative 
figure which measures the central value of a frequency distribu- 
tion and may be used to represent it in comparison with other 
frequency distributions. By this method we can compare the 
standing of Class A with Class B, or the wages of carpenters and 
plumbers, or the annual income of physicians and lawyers. It 
is, however, the nature of the ordinary frequency distribution to 
scatter about its mid-point or average. In some distributions 
the items fall very close together, and, therefore, close to the mean. 
In such cases the mean represents the group quite well because it 
is the same size or nearly the same size of the other items in the 
group. In other distributions the items are widely scattered. 
Some of them are very much smaller than the mean, while others 
are very much larger. In such cases the average does not repre- 
sent most of the individual items very well. This may be illus- 
toted as follows: 

Items Mean 

Group 1 150, 150, 150, 150, 150, 150, 150, 150, 150 150 
Group 2 146, 147, 148, 149, 150, 151, 152, 153, 154 150 
Group 3 16, 25, 50, 112, 150, 188, 255, 370, 384 150 

In Group 1 the mean is 150 which is exactly the same as all the 
other items in the distribution. It, therefore, represents all the 
items in the group perfectly. 

In Group 2 the mean is 150, but it is not exactly the same size 
as any other item in the group. It is, however, so nearly the 

208 
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WORKSHEET NO. 27 

Dispersion of Grades op Two Classes in Statistics, Oklahoma 
A. & M. College, Stillwater, Semester 

Ending January 23, 1942 

Class A — 25 students 

Class B — 25 students 

Grades (400 points possible) 

Grades (400 points possible) 

398 


378 


387 


374 


382 


372 


366 


370 


358 


370 


356 

Less 

368 

More 

351 


366 


341 

Representative 

364 

Representative 

340 


364 


337 

Average 

362 

Average 

336 — 

334.8 mean 

360 


332 


360 


332 — 

median 

354 — 

median 

324 


352 — 

353.6 mean 

323 


350 


321 


350 


320 

Grade Range 

346 

Grade Range 

318 


344 


317 

398 

342 

378 

316 

287 

338 

317 

313 

101 

336 

61 

310 


336 


306 

Wider Range of 

334 

Narrower Range of 

300 

Student Attainment 

332 

Student Attainment 

288 


318 



same size of the other items that it represents the group quite 
well. 

In Group 3 a quite different condition exists. The mean is 150 
again, but only one item in the group is anywhere near its size. 
It is nearly ten times as large as 16, the smallest number, and less 
than one-half the size of the largest figure, 384. It is evident at 
a glance that this average does not represent accurately the indi- 
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vidaial items in its group. The reason for this poor representation 
is the wide scatter of this group, from 16 to 384. In Group 1 
the scatter about the mean was zero. In the second group it was 
nine, 146^154, inclusive. In Group 3 the scatter is 369. This 
illustration indicates the need for some measure of the scatter of 
the items in a sample about the sample average. The name given 
to this scatter is dispersion. 


RANGE 

The simplest and largest measure of dispersion is the range. It 
is the total inclusive distance from the smallest to the largest 
items. If the smallest item is 9 and the largest item is 31, the 
range is 31 — (9 — 1), or 31 — 8 = 23. The 9, smallest item, is on 
the inside of the ranges as well as the largest item. The range is 
23 in this case instead of 22 as one at first might think. If the 
data are thrown into class intervals as follows, 8.5-10.4, and so 
on up to 30.5-32.4, the range is: 

32.4 - (8.5 - .1) - 32.4 - 8.4 - 23. 

By either method both the largest and smallest items must be 
included. Frequently the range for the same data in grouped 
and ungrouped forms will be slightly different. 

The range is more widely used at present than formerly as a 
measure of dispersion. At the extremes it passes over several 
class intervals which have such small frequencies that they are 
negligible in value. They are as wide as other classes but their 
content is thin. Most of the items in most distributions tend to 
fall in a few classes near the center. The range tends to over- 
emphasize scatter. A narrower measure would usually be more 
truly representative of the distribution. 

The two statistics classes shown in Worksheet No. 27 illustrate 
this point. If we drop off three items from each end of Class A, 
the range is reduced from 101 to 57, or almost one-half. If four 
items are dropped off each end, the range is only 46. By reducing 
the sample 32%, the range is reduced 55%. This overemphasis of 
dispersion by the range is especially true in samples of exception- 
ally^ wide range. In Class B, dropping the three extreme item* 
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from each end of the array reduces the range from 61 to 35. From 
these points in the array the data are so concentrated that drop- 
ping five item's on each end would reduce the range to only 31. 


QUARTILE DEVIATION 

The quartile deviation is one-half the distance between the 
third and first quartiles. It is computed by 


Formula No. 18 


Qd = 


Qz ~~ Qi 

2 


For the yields of wheat shown in Chapter 9, the computation is : 


Qd — 


20.6 - 15.0 
2 


2 


2.8 


Such a deviation is used plus and minus, or measured on both 
sides of the mid-point between the quartiles. In this case i 2.8, 
or the full distance between the two quartiles, is 5.6 or the quartile 
range. While the full range of the data used above is 23 bushels, 
the quartile range is only 5.6. Since fifty percent of any fre- 
quency distribution is included between the two quartiles, in 
this case the middle one-half of all the items is clustered in the 
narrow space of 5.6 bushels, while the other fifty percent are 
scattered over 23 — 5.6, or 17.40 bushels. In other words, the 
middle one-half of the distribution is located above the middle 
one-fourth of the total range. The quartile range is an excellent 
measure of the dispersion of the dense middle half of the data. 
The main objection to it is that it is too small. Just outside of it 
on either edge is a rather dense area of items, numerous enough to 
be of real importance, which it excludes. Although the total 
range is too broad to be a good measure of dispersion, the quartile 
deviation is too narrow. The first one takes in too much thinly 
populated territory. The second one leaves out too much thickly 
populated area. The quartile deviation is a locational measure 
taking into consideration only the number of items and not their 


size. 
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MEAN DEYIATION 

Another measure of dispersion is the mean deviation^ based on 
the arithmetic mean of the distance of all the items in a distribu- 
tion from their median. While the mean deviation may be com- 
puted from the mean as a base, the median is to be preferred in 
most cases because the sum of the deviations of the items from the 
median is a minimum and is likely to be more stable and repre- 
sentative in comparing frequency distributions with various 
amounts of dispersion. The deviations are averaged without re- 
gard to sign. From ungrouped data it is computed as follows: 

Meaning of symbols: 

Md — mean deviation 

d~ {X — Me), the difference between the item and the 
sample median 

Formula No. 19 



WORKSHEET NO. 28 


X 

- Afe = 

d 


82 

- 97 = 

15 

Md — TT 

95 

- 97 = 

2 

^ N 

97 

- 97 = 

0 

= ¥ 

104 

- 97 - 

7 

= 7.8 

112 

- 97 - 

15 




39 



This formula is easy to work, but for a large sample with a 
median involving decimal fractions it is time-consuming and tedi- 
ous. Under such conditions, it is more convenient to use class 
intervals. 

Mean Deviation with Class Intervals 
Formula No. 20 Md = — ^ 
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WORKSHEET NO. 29 

COMPTJTATION OF MeaN DEVIATION OP WHEAT YIELDS ON 160 FaRMS 

Class Intervals 
in Bushels 
of Wheat 

i 

Mid- 

points 

m 

Fre- 

quencies 

/ 

Median 17.57 
Deviations Disre- 
garding Algebraic 
Signs 
(m — Me) 

Frequencies 

Times 

Deviations 

/(m — Me) 

8.5-10.4 

9.5 

4 

8.07 

32.28 

10.5-12.4 

■ 11.5 

11 

6.07 

66.77 

12.5-14.4 

13.5 

20 

4.07 

81.40 

14.5-16.4 

15.5 

29 

2.07 

60.03 

16.5-18.4 

17.5 

30 

.07 

2.10 

18.5-20.4 

19.5 

25 

1.93 

48.25 

20.5-22.4 

21.5 

21 

3.93 

82.53 

22.5-24.4- 

23.5 

9 

5.93 

53.37 

24.5-26.4 

25.5 

6 

7.93 

47.58 

26.5-28.4 

27.5 

3 

9.93 

29.79 

28.5-30.4 

29.5 

1 

11.93 

11.93 

30.5-32.4 

31.5 

1 

13.93 

13.93 



160 


529.96 


S/(m - Me) 529.96 
” N - 160 


3.31 


The quartile deviation is 2.8 bushels, which when measured 
plus and minus from a point half-way between the two quartiles, 
is 5.6 bushels. The mean deviation is 3.31, which, when meas- 
ured plus and minus from the median, gives : 


17.57 ± 3.31 = 14.26 - 20.88, 

on the class interval scale, or X-axis, and included 90.4 of the 
160 farms or 56% of the total frequency distribution. In a per- 
fectly normal frequency curve the percentage of the total items 
that would fall within one db Ma would be 56.48%. 

The mathematical peculiarity of the mean deviation must be 
noted. It violates the basic principles of algebra in that while 
the deviations from all items or mid-points smaller than the 
median are minus in sign and the deviations from all items or 
mid-points larger than the median are plus, these significant al- 
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gebraic signs are totally disregarded. They have to be disregarded 
if the mean deviation is to be used or even to exist. If the pluses 
were to be subtracted from the minuses or if their algebraic sum 
were taken, the net result would be zero or approximately zero 
in any distribution approaching normal. Instead of getting a full 
measure of the dispersion one would secure only a small residual 
of the dispersion. Since the mean deviation is mathematically 
unsound it does not lend itself to further analysis. This mathe- 
matical handicap of the mean deviation is one of the reasons that 
it is not widely used in statistical studies. It may be used as a 
basis of correlation, but for this and other purposes there is a 
superior measure. One of the chief reasons for teaching the theory 
and method of the mean deviation is that it is a necessary step in 
the development and understanding of this superior measure which 
is called standard deviation. 


STANDARD DEVIATION 

The mean deviation falls between the broad total range and the 
narrow quartile range, including over 56% of the total items. 
Although it is a more truly representative measure of dispersion 
than either of the others, it has the two defects that (1) it is un- 
sound algebraically, and (2) it still leaves outside its limits a 
•considerable area of the dense portion of the frequency distribu- 
tion. The standard deviation has none of these limitations and 
is generally considered by statisticians as by far the best measure 
of dispersion. It is based on squared deviations from the mean. 

Meaning of symbols : 

cr == standard deviation 

X = deviation between an item of data or a mid-point and 
the arithmetic mean 
N = number of items in sample 

Formula No. 21, used only for individual items 
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WORKSHEET NO. 30 


z 

- Z = 

X 


82 

~ 98 = 

- 16 

256 

95 

- 98 = 

- 3 

9 

97 

- 98 = 

- 1 

1 

104 

- 98 = 

+ 6 

36 

112 

- 98 = 

+ 14 

196 

498 



= 9.98 or 10.0 


It will be noted that the mean deviation computed from these 
same data in Worksheet No. 28 was only 7.8 as compared with 
9.98 or 10.0 for the standard deviation which in this case is about 
one-third the range of 31. The greater size of the standard devia- 
tion is because of the fact that in computing a, the x^s are squared. 
This weights the small deviations such as 1 and 3 by the small 
numbers 1 and 3, but weights the large numbers 14 and 16 by the 
large values, 14 and 16. This squaring of the deviations gives 
the items more distant from the mean the most importance or 
weight and thus pulls the squared average farther from the center. 

Next to the arithmetic mean, the standard deviation is jbhe most 
useful and most widely used measure in statistical analysis. 
These two measures are to the statistician what the ax and cross- 
cut saw are to the woodsman — the basic tools for working up 
his raw materials. 

One point should be emphasized at this place. The standard 
deviation should always be computed from deviations from the 
arithmetic mean instead of the median. When taken from the 
mean it is a minimum; that is, a standard deviation computed 
from the true arithmetic mean is smaller than one computed from 
any other average, and it is more stable and dependable as be- 
tween various frequency distributions. The method illustrated in 
Worksheet No. 30 reveals every detail of the computation and all 
the principles and methods involved, but it is too long and labo- 
rious to use in large series of data. For such problems the data 
should always be thrown into class intervals and the methods 
used in Worksheets Nos. 31 or 32 employed. 
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Long Method with Class Intervals 
Foemula No. 22 



WORKSHEET NO. 31 

Computation of Standard Deviation from Class Intervals 
BY Long Method for Wheat Yields on 160 Farms 


Class 

Intervals 

Mid- 

Fre- 

X= 17.85 

Deviations 

Frequencies 

Times 

Bushels 
of Wheat 

points 

quencies 

X 

Squared 

Squared 

Deviations 

% 

m 

/ 

(m-X) 


fx^ 

8.5-10.4 

9.5 

4 

- 8.35 

69.7225 

278.8900 

10.5-12.4 

11.5 

11 

6.35 

40.3225 

443.5475 

12.5-14.4 

13.5 

20 

- 4.35 

18.9225 

378.4500 

14.5-16.4 

15.5 

29 

- 2.35 

5.5225 

160.1525 

16.5-18.4 

17.5 

30 

- 0.35 

.1225 

3.7650 

18.5-20.4 

19.5 

25 

+ 1.65 

2.7225 

68.0625 

20.5-22.4 

21.5 

21 

+ 3.65 

13.3225 

279.7725 

22.5-24.4 

23.5 

9 

+ 5.65 

31.9225 

287.3025 

24.5-26.4 

25.5 

6 

+ 7.65 

58.5225 

351.1350 

26.5-28.4 

27.5 

3 

+ 9.65 

93.1225 

279.3675 

28.5-30.4 

29.5 

1 

+ 11.65 

135.7225 

135.7225 

30.5-32,4 

31.5 

1 

4- 13.65 

186.3225 

186.3225 



160 



2,852.4000 


The mean deviation for these data (Worksheet No. 29) is only 
3.31^ but the standard deviation is 4.22 bushels. Measured plus 
and minus from the mean, 17.85, it reaches on the class interval, 
X-axis, scale from (17.85 ± 4.22) 13.63 to 22.07 and includes 
109.31 of the 160 items in the sample, or 68.32% of the data. In 
a perfectly normal frequency distribution of large size, a ± a 
includes 68.27% of the items in the distribution. This distribu- 
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tion of yields of wheat is slightly skewed, which accounts for the 
fact that the percentage of items included in the one dz cr is a 
little larger than that to be expected in a smooth distribution. 

The above method of computing the standard deviation from 
the true mean is long and tedious because it usually involves long 
decimal fractions, and runs into large numbers. The following 
method is much shorter and should be thoroughly mastered by 
the student. It is accurate, easy to compute, and widely used. 


Short Method with Class Intervals 

If one will duplicate Worksheet No. 16 (page 177) for computing 
the arithmetic mean by the short method and add one more 
column, the arithmetic mean and standard deviation both 

may be computed at the same time with a minimum of labor. 

Meaning of symbols: 

cr = standard deviation computed from the true arithmetic 
mean, or the true standard deviation 
8 — standard deviation computed from an assumed mean 
c = correction for the standard deviation computed from 
an assumed mean 

x' = deviation from an assumed mean 


Formulas: 



Summary Formula No. 23 
Formula No. 24 

Formula No. 25 

Formula No. 26 




The first five columns of Worksheet No. 32 are identical with 
those in Worksheet No. 16, which provides the arithmetic mean 
by the short method. By the simple device of adding a sixth 
column, which is the product of the fourth and fifth columns,, the 
standard deviation is secured. 
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WORKSHEET NO. 32 

CoMPUTATioisr OF Standaed Deviation by Short Method with 
Class Inteevals foe Wheat Yields on 160 Faems 


Class 
Intervals 
Bushels 
of Wheat 

Mid- 

points 

m 

Fre- 

quencies 

/ 

Deviations 

from 

Assumed 

Mean 

Frequencies 
Times De- 
viations from 
Assumed 
Mean 

m 

Frequencies 

Times 

Squared De- 
viations from 
Assumed 
Mean 
f{xy 

8.5-10.4 

9.5 

4 

- 4 

- 16 

64 

10.5-12,4 

11.5 

11 

-3 

- 33 

99 

12.5-14.4 

13.5 

20 

- 2 

-40 

80 

14.5-16.4 

15.5 

29 

- 1 

- 29 

29 

16.5-18.4 

17.5 

30 

0 



18.5-20.4 

19.5 

25 

+ 1 

+ 25 

25 

20.5-22.4 

21.5 

21 

+ 2 

+ 42 

84 

22.5-24.4 

23.5 

9 

+ 3 

+ 27 

81 

24.5-26.4 

25.5 

6 

+ 4 

+ 24 

96 

26.5-28.4 

27.5 

3 

+ 5 

+ 15 

75 

28.5-30.4 

29.5 

1 

+ 6 

+ 6 

36 

30.5-32.4 

31.5 

1 

+ 7 

+ 7 

49 



160 


+ 28 

718 


Tfixy 


N 

\ N j 


.0306 = 2 V4.4669 
= 2 X 2.11 = 4.22 


Since the a;'’s in Worksheet No. 32 on wheat yields are taken in 
class intervals, the standard deviation, 2.11, is in class intervals. 
This figure, 2.11, means that the standard deviation in this case 
is 2 and class intervals wide. To change this number to units 
of the original data, which is bushels in this case, the 2.11 must 
be multiplied by the width of the class interval, or 2. 

Worksheet No. 32 and the formulas for the arithmetic mean, 


X = A + 
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and the standard deviation, 



are among the most useful and widely used in all statistics. With 
a minimum of figures and with those of small size, both X and cr 
may be quickly computed. The student should master them 
fully. 

The relative width of the four measures of dispersion are clearly 
shown on Fig. 46 on page 220. For the distribution of height 
of these 160 wheat farms, they are: 


Range =24 bushels 

Quartile Deviation ±2.8 =5.6 bushels 

Mean Deviation ± 3.31 = 6.62 bushels 

Standard Deviation ± 4.22 = 8.44 bushels 


Of course, these particular figures hold for only this sample of 
the yields of wheat on these 160 farms, but the relative size of 
the measures would remain about the same for any fairly normal 
distribution of data. The ratio of the standard deviation to the 

range for various sizes of samples is 

approximately as follows: 

Size of Sample 

Range 

cr 

30 

4.0 

100 

5.0 

200 

5.5 

500 

6.0 

1,000 and over 

6.5 


The reason for this variation is that in small samples there is 
little likelihood of getting any of the extremely small or extremely 
large items of the population. The small sample is almost certain 
to be composed of items from the dense middle portion of the 
universe. This limitation would give it a relatively short range. 
In a large sample, however, some of the more extreme items 
would likely be included. This wider distribution would lengthen 
the range. 





Fig. 46. Relative sizes of (A) Quartile Deviation, 
(B) Mean Deviation, and (C) Standard Deviation 
for yields of wheat on 160 wheat farms 


In a perfectly normal distribution Qd = 0.6745cr, and Md = 
0.7979(r. These ratios will vary slightly as the frequency curve 
departs from normal. 

Like averages, all measures of dispersion are expressed in terms 
of the original data; that is, in inches, pounds, tons, dollars, 
bales, meters, percentages, etc. 
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QUADRATIC MEAN 

The quadratic mean often called the root-mean-square in some 
respects resembles a standard deviation in methods of computa- 
tion but is only an average or mean. The standard deviation is 
always based on deviations from the mean. A quadratic mean 
may be computed for any type of measurement. It is the square 
root of the arithmetic mean of the squares of the values. Since, 
when an item is squared, it is weighted by itself, the quadratic 
mean gives the greater weight to the larger items. It is, there- 
fore, larger than the simple arithmetic mean for the same 
data. 

It is not widely used, but it may be employed to measure the 
scatter of items about a fixed point such as rifle shots around a 
“bulbs eye.’^ How small must the quadratic mean scatter of 
the shots be before one is rated a sharpshooter? The data 
shown in Fig. 10 and Worksheet No. 33 are illustrations of this 
measure. 


WORKSHEET NO. 33 

Quadratic Mean of Random Sample of 100 Rifle Shots at 50 Feet 
Taken from Practice Targets of Oklahoma A. M. 
College R.O.T.C. Basic Military Students, 

February 12, 1942 


Class 

Intervals in 
Millimeters 

Mid- 
point 
of Class 

Fre- 

quencies 

Deviations 
from 0 in 
Class Units 

Frequencies 

Times 

Deviation 

Frequencies 
Times Squared 
Deviation 
from Zero 


m 

/ 

X 

fx 

fx‘‘ 

0- 3.9 

2 

38 

.5 

19.0 

9.50 

4- 7.9 

6 

23 

1.5 

34.5 

51.75 

8-11.9 

10 

14 

2.5 

35.0 

87.50 

12-15.9 

14 

10 

3.5 

35.0 

122.50 

16-19.9 

18 

7 

4.5 

31.5 

141.75 

20-23.9 

22 

4 

5.5 

22.0 

121.00 

24-27.9 

26 

3 

5.6 

19.5 

126.75 

28-31.9 

30 

1 

7.5 

7.5 

56.25 



100 



717.00 
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Formula No. 27 

= 10.72 millimeters. 

Two characteristics of the data have now been measured, its 
central point of concentration, or average, and its tendency to 
scatter about that average, or its dispersion. Three other measures 
are necessary to complete the analysis and description of a fre- 
quency distribution. 


SKEWNESS ‘ 

Skewness is the tendency of a distribution to depart from nor- 
mal in the balance of its two sides, or to be lopsided. Perfectly 
normal distributions are rare except in perfectly random and fair 
games of chance and in the pure physical sciences. In the field 
of biology and related sciences, distributions tend to approach 
normal, but in economics, sociology, political science, and wherever 
human activities and institutions are concerned, distributions 
tend to be skewed. 

Meaning of symbols: 

Sk = degree of skewness 

Formula No. 28 

o __ Qs + Qi 2Me 
Qz-Qi 

This theory is based on the fact that in a perfectly smooth or 
balanced distribution the numerator would be zero, or no skew- 
ness. The relationship in a perfectly normal distribution is that 
the distance of either quartile from the median is the same. That 
is: 

Qz- Me Me- Qi 

transposing 

Qs "1“ Qi — Me — Me = 0 

Qs 4“ = 0 
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A 

Normal Distribution 



Mode 




Fig. 47. Forms of normal and skewed distributions 


The denominator is the inter-quartile range, 
the 160 farms the computation is: 


& = 


20.6 + 15.0 - 2 X 17.57 
20.6 - 15.0 


.46 

5.6 


.082 


For the data on 


With this formula a figure of .20 or less is a small or moderate 
amount of skewness. A figure of .40 or more is a large degree of 
skeTOess. 
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Formula No. 29 
„ 3(Z - Me) 

Oi = 

(J 

This formula is based on the fact that when a distribution de- 
parts from normal, the mean, median, and mode are pulled apart 
in the proportion of 1 to 3; that is, the distance between the mean 
and median is one-third the distance between the mean and mode. 
The mean always remains nearest the tail or skew of the distribu- 
tion. The mode remains under the highest frequency and the 
median still divides the area of the distribution into two equal 
parts. The standard deviation is used as the base of variation. 
For the data on the 160 wheat farms, the computations are: 

^ 3(17.85 - 17.57) 3(.28) _ .84 _ ^ 

" 4.22 4.22 4.22 

KURTOSIS 

A fourth measure of frequency distributions to test how near 
they conform to the normal curve is kurtosis, which indicates 
whether the distribution is more fiat topped or more peaked than 
the normal. The formula and worksheet for an accurate meas- 
urement of kurtosis are rather involved and difficult for elemen- 
tary students. Since most frequency distributions do not have a 
sufficiently high degree of kurtosis to invalidate or seriously 
modify their statistics, the measurement of kurtosis may be com- 
pletely disregarded in all but the most advanced studies. 

COEFFICIENT OF VARIATION 

This fifth measure of frequency distributions is onl}^ a ratio of 
the first two, the arithmetic mean and the standard deviation. 
It measures the relative scatter of two or more samples about 
their means. 

The coefficient of variation, however, is stated in the form of a 
percentage and not in terms of the original data. It is especially 
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useful in making comparisons between distributions which have? 
different means and different standard deviations. For instance, 
it enables one to compare the relative scatter of the heights of 1 st- 
and 6th-grade children around their respective mean heights. 



Mean 

Standard 

Coefficient of 



Heights 

Deviation 

Variation 


1st grade 

43.5 

3.1 


= 7.1% 

6th grade 

58.2 

4.9 


= 8.4% 


These coefficients show that in proportion to the size of their 
mean heights 6th-grade children vary more in height than do 1st- 
grade children. 

When, however, comparisons are attempted between distribu- 
tions which are expressed in entirely different units, such as 
heights and weights of school children, the great value of the co- 
efficient of variation is more evident. Since the mean and stand- 
ard deviation in one case are expressed in inches and in the other 
case in pounds no direct comparisons of scatter can be made. But 
when both are reduced to percentage the comparison is possible. 

X cr F(%) 

Heights (inches) 52.58 4.75 9.0 

Weights (pounds) 66.00 20.28 30.7 

This comparison makes it at once evident that children, in 
proportion to the mean, vary more than three times as much in 
weight as they do in height. 

Only such a device would make possible a comparison of dis- 
tributions as diverse as yields of wheat, and building permit 
valuations indicated above. Distributions with identical means 
and different standard deviations expressed in the same units may 
be compared directly as to relative scatter about their means, but 
all other types of distributions are best compared by means of 
their coefficients of variation. It reduces the diverse physical units 
to a common denominator. 
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SUMMARY 

1. The range includes all the data from the smallest to the largest item. 
It tends to be smaller in small samples (25 to 30 items) than in large 
samples (100 or more items), but in any case it is not a good measure of 
dispersion, because on its extremities it covers a long distance of very 
.'Sparse data. It is too large to be an accurate measure of dispersion. It 
is also subject to a large variation from chance sampling. It is a loca- 
tional value. 

2. The quartile range includes the middle one-half of the sample items, 
'The chance of an item being inside or outside is 50-50. It is too small 
"to be a good measure of dispersion. It leaves out too much of the densely 
populated area of the normal curve. It is a locational value. 

3. The average deviation measured plus and minus is a wider measure 
of dispersion than the quartile range, but it has the disadvantage of 
being algebraically unsound in that its computation requires the disre- 
garding of all plus and minus signs. It may be used as a basis of corre- 
lation but because of other limitations it is not widely used. It is a 
computed value. 

4. The standard deviation is the square root of the sum of the squared 
deviations from the mean divided by the number of items. It is algebrai- 
cally sound and is the most stable and widely used measure of dispersion. 
It is a computed measure. It is used as the basis for correlation and most 
computations of statistical error. The student should master it thoroughly. 

5. Skewness is the measure of the lopsidedness of a frequency dis- 
tribution. They may be skewed either to the right or to the left. The 
skew is in the direction of the long narrow end or tail of the distribution. 
Distributions in the physical and biological universe tend to be more 
nearly normal than among social data which are usually skewed to the 
right. If the skewness is quite large it invalidates the use of the arith- 
metic mean and standard deviations as accurate measures of the dis- 
tribution. In such cases the median or the mode and the quartile range 
are more representative of the large body of the data. 

6. The coefficient of variation measures the relative scatter of the items 
of a sample about their mean as compared with other distributions. Some 
distributions are closely confined about their mean; others are widely 
scattered. The narrower the scatter of the items about the mean, the 
more representative of the items is the mean. 
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REVIEW QUESTIONS 

1. Define dispersion. Why is it necessary to measure dispersion in 
order to make complete comparisons of frequency distributions? 

2. What is the range? What are its advantages and disadvantages 
as a measure of dispersion? 

3. What is the difference between the quartile range and quartile 
deviation? How much and what portion of the distribution do they 
measure? 

4. Compare the length of the quartile range with the total range. 

5. What are the advantages and disadvantages of the mean deviation? 

6. Explain the difference in the methods of computing the mean de- 
viation and the quartile deviation. 

7. What is the difference between the mean deviation and the stand- 
ard deviation? 

8. Explain the difference between the long method and the short 
method of computing the standard deviation. 

9. Compare the standard deviation with the other measures of dis- 
persion as to ease of computation, adequacy of measurement of disper- 
sion, and algebraic manipulation. 

10. What is skewness? In what ways may it be measured? Of what 
practical use is its measurement? 

11. What is the coefficient of variation and for what is it used? 

12. In what units is the standard deviation always measured? 

13. If a distribution is highly skewed, what average is often better 
than the mean? 

14. What does “skewed to the left’^ and “skewed to the right 
mean? 




228 DISPERSION, SKEWNESS, AND VARIATION 


EXERCISES 

1. Compute measures of dispersion for data in the Exercises at the end 
of Chapter 6 and Chapter 9. 


Class Intervals 

Frequencies 

5- 9.9 

3 

10-19.9 

11 

20-29.9 

16 

30-39.9 

12 

40-49.9 

4 

Class Intervals 

Frequencies 

2.5- 7.4 

6 

7.5-12.4 

17 

12.5-17.4 

25 

17.5-22.4 

15 

22.5-27.4 

2 

Class Intervals 

Frequencies 

0- 2.9 

17 

3- 5.9 

11 

6- 8.9 

7 

9-11.9 

5 

12-14.9 

4 

15-17.9 

2 

18-20.9 

1 

Class Intervals 

Frequencies 

.045-.054 

7 

.055-.064 

29 

.065-.074 

16 

.075-.084 

8 

.085-.094 

3 

.095-.104 

2 

ol05-.114 

1 
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LINEAR REGRESSION 


Regression involves a comparison between two or more variables.’ 
In all the statistics computed up to this point one variable only 
has been measured at a time. Means have been computed for 
height of school children, cost of housing units, wheat yields, 
train speeds, price relatives and other data, but in no case, as 
yet, have there been any comparisons made between two series. 
Each one has stood absolutely alone. Standard deviations have 
been computed for several variables, but no comparisons have 
been made between any of these series of data. In fact, the com- 
putation of averages and dispersions is limited to the measurement 
of single variables. Fundamental and important as are all of 
those methods, they do not extend to the measurement of the re- 
lations among two or more series of related data. One of the more! 
common methods of statistical analysis used to measure relations! 
between two variables is called regression. 

Immediately after one has computed the mean and standard 
deviations of the weight and height of children, other questions 
arise. Is there any relationship between the height of a child 
and his weight? Is a tall child likely to weigh more than a short 
one? How accurately can one estimate the weight of a child if 
his height is known? How accurate, or dependable, would such 
an estimate be? How much error on the average is there in such 
estimates? Is there any scientific basis for these age-height- 
weight charts by which mothers measure the growth of their 
babies from birth to school age and which teachers use to de- 
termine whether school children are normal or are underweight or 
overweight? When is a child underweight or overweight? What 

229 




230 


LINEAR REGRESSION 


percentage of his weight may be accounted for by his height? 
What percentage of children’s weights may be explained by their 
heights, and what part of their weights may have to be accounted 
for by age, or heredity, or family characteristics, or by proper and 
sufficient food, or undernourishment? These and many other 
questions must be answered before our analysis of these data is 
complete or even sufficient for ordinary purposes. 

Many questions arise concerning the yields of wheat and farm 
income. What is the relationship between the two? Is a yield 
of one bushel or five bushels as likely to be associated with high 
income as would twenty bushels? How much does it cost to 
produce a bushel of wheat? Does the cost per bushel decline as 
the yield increases? What is the regression between these vari- 
ables? 

WTat is the relationship betvreen demand for a good and its 
price? or between its supply and price? How many more potatoes 
would be sold at $.50 a bushel than at $1.00 a bushel? What is 
the effect on consumption of doubling the tax on cigarettes? 
These and thousands of other variables must be compared to 
answer the problems of management and science. 

Regression measures are the means of answering many of these 
I questions. Regression is the measure of the average relationship 
between two or more variables in terms of the original units of the 
I data. It is generally recognized as the most important single 
‘ field of statistical analysis. It is the basis of correlation analysis, 
co-variance analysis, and of many important measures of error. 
It is the stuff that science is made of. It furnishes an accurate 
basis for the comparison of variables, of series of data, of signifi- 
cant relationships. It ferrets out and expresses in clear, meas- 
urable terms the hidden and inner meanings of great masses of 
numbers which otherwise would remain unknown. Comparisons 
are the basis of all organized knowledge and of all human institu- 
tions. Every day of our lives, we are forced to make comparisons 
and far-reaching decisions based on comparisons. Shall we go to 
town or to school if it rains? What will be the effect on my grades 
if I reduce my study time for social pleasures or athletic exercises 
or student activities? If I take eighteen hours instead of fifteen, 
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what will be the results on grade points? If I reduce my hours of 
sleep from eight to five, what will be the effect on my efficiency? 
What are the chances that marriages of persons under twenty 
years of age will be as satisfactory and enduring as those made 
after twenty-five? What is the relationship between the prices the 
farmer receives and the prices the consumer pays? What is the 
relationship between the speed of driving and the time and dis- 
tance required to stop the car? In the hurry of daily life we an- 
swer these and a thousand other vital questions as best we can 
and go on with the job. The methods of regression in statistics 
give us more exact and dependable answers. 

Science may be defined as a statement of observed uniformities 
and the relationships among them. Regression is the basic tool of 
analysis in all science. Chemistry, physics, astronomy, as well 
as psychology, economics, sociology, and political science are no 
more than statements of uniformities which have been observed 
concerning phenomena and the regressions which have been meas- 
ured among them. In a more popular sense we may call re- 
gression a trend, a line which shows how many units of change in 
one variable are associated with one unit of change in another 
variable. If one boy is six inches taller than another, how many 
more pounds should he weigh? If he is two years older than the 
other boy, how tall should he be? Is there any reliable method 
by which weight can be estimated from height? Is it possible to 
estimate the incomes of farms from the yields of wheat or cotton 
or hay per acre? Such possibilities are always present whenever 
two variables are in close association. For instance, the weight 
of children might increase 3.5 pounds for each increase of 1.0 inch 
in height; or the income of wheat farms might increase 1.0% for 
each increase of 3.0 bushels in yield, or the price of potatoes in 
the general market might decline 1.0 cent for each increase of 
10,000,000 bushels in supply. 


INDEPENDENT AND DEPENDENT VARIABLES 

When comparisons are made between two variables, one of them 
is called the independent variable and the other is designated the 
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dependent variable. The independent series is the one which 
varies or is thought of as varying independently as height, yields, 
supply, heat, or speed. The dependent series is the one which 
changes or is thought of as changing as a result of change in the 
independent series, such as weight, income, price, expansion, or 
distance traveled when associated with the variables listed in the 
previous sentence. It is not at all necessary that there be a 
causal relationship between the independent and dependent 
variables. All that is required is that the amount of change in the 
dependent be considered in relation to a unit of change in the 
independent variable. The dependent series is always measured 
on the vertical or F-axis and the independent series on the hori- 
zontal or X-axis. 


REGRESSION BASED ON AVERAGES 

In measuring regression or trend relationship between two 
variables the result is based on the average of all the pairs of data in 
the sample. Regression cannot be computed on a single pair of 
items any more than a mean or standard deviation can be com- 
puted for one datum. As a mean is an average of one series of 
data so a regression line is an expression of the average relation- 
ship between the paired items of two related series of data. There 
can be no regression between the height and weight of one child. 
When the sample is increased to include two or more pairs of items 
the measure of their average relationship is regression. 


METHODS OF COMPUTING REGRESSION LINES 

In this chapter three methods of computing regression lines 
will be explained, (ij Freehand lines, (2) Lines through class 
averages, and (3) The method of least squares. The first two 
methods require little or no mathematics and are used largely for 
purely descriptive purposes or for preliminary analysis. For more 
complete and adequate analysis and inference the more mathe- 
matical least squares method is employed. 
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Freehand Curves 

The first step in any case in fitting a regression line is to plot 
the data on coordinate graph paper, or on any paper on which 


Weight 

Pounds 



Fig. 48. Free-hand curve showing regression be- 
tween heights and weights of 106 Stillwater grade- 
school children. (Data from Worksheet No. 1) 

one can set up the X-axis and F-axis and measure abscissas and 
ordinates as indicated in Figs. 48, 49, and 50, which were plotted 
from the paired values of height and Aveight, age and height, and 
age and weight from the data in Worksheet No. 1. 

Figure 48 shows a freehand curve which expresses the average 
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relationship between the height on X and the weight on Y of the 
106 school children. This line is a smooth curve which is rela- 
tively horizontal for the smaller values of X but rises more rapidly 
for the larger X values. This indicates that increase in weight is 


Height 

Inches 



Fig. 49. Free-hand curve showing regression between heights and 
ages of 106 Stillwater grade-school children. (Data from Work- 
sheet No. 1) 


not a strictly proportional change for increase in height, but that 
as children grow taller they tend to increase in weight at a more 
rapid rate than they do in height. This would be expected since 
weight is a three-dimensional product while height is only one. 

Figure 49 shows the relationship between age and height for 
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the same children. This curve is drawn on the graph freehand. 
Although it is almost straight, it rises more steeply for the earlier 
ages and more slowly as the children grow older. This shows that 
these children on the average increased in height more slowly as 
they approached maturity. 

Weight 

Pounds 



70 80 90 100 110 120 130 140 150 160 170 180 

Age, Months 

Fig. 50. Freehand curve showing regression between weights and ages 
of 106 Stillwater grade-school children, (Date from Worksheet No. 1) 

Figure 50 shows the relationship between age and weight. This 
line was drawn in freehand. It is a perfectly straight line. In 
all three graphs there is no pre-judgment or assumption of any 
particular relationship between the variables. The data are 
simply plotted on an accurate scale and a freehand line drawn in 
which as nearly as possible describes their general trend or move- 
ment. Such a method is purely descriptive, but it is quite useful 
in indicating the general type of mathematical line that is to be 
used if one is to be fitted. This is important because all mathe- 
matical lines rest on pre-judgments or assumptions of relation- 



236 


LINEAR REGRESSION 


ship which may be either correct or incorrect, depending on the 
amount of information available before the decision is made. If 
properly done the freehand curve is an aid in making the decision 
as to what type of mathematical line to use. 

Technique for Drawing Freehand Curves 

1. Plot the data on coordinate graph paper with one variable 
measured on the Z-axis and the other variable on the F-axis. 

2. Hold a tight string or a slender rubber band over the plotted 
points so as to split the points above and below the string or band 
into two equal regions or sections. 

3. If a straight line does not correctly represent the relation- 
ship, shift the position of the string, turning it along the ends of 
the plotted data until you form a clear idea of the approximate 
degree of curvature to give the line. 

4. First draw in a light line with pencil so that it splits the data 
in the middle from one end of the X-axis to the other. By gentle 
erasures and re-drawing, adjust the line until it is the best you 
can make. Then draw in a heavier line with ink. 

The freehand curve has the following advantages: 

1. It is easily and quickly obtained. 

2. For purely descriptive purposes it is sufficiently accurate. 

Its disadvantages are: 

1. It is not represented by any algebraic equation. 

2. Since it is freehand, the same person at another time, or 
another person, would not exactly reproduce the same line. 

In spite of these limitations, the freehand curve is very useful 
in preliminary analysis. It is an easy means of determining the 
type of mathematical regression that should be worked out if a 
more complete analysis is attempted later. With a little prac- 
tice many persons become exceedingly adept in drawing freehand 
curves. 

Lines through Class Averages 

By this method the data are divided into two approximately 
equal parts. The division does not have to be absolutely equal, 
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but a division near the median of the X-variable is preferable. 
The steps to be followed are: 

1. Arrange the items of the independent or X-variable in an 
array in the right-hand column Avith the associated F-value in an 
adjoining column. The paired X and Y -values cannot he separated. 
This requirement will leave the F- values in an irregular order. 

2. Divide the columns somewhere near the median. 

3. Compute simple means for each column of X-values and 
their associated F- values. 

4. Plot the original data on coordinate graph paper. 

5. Locate the two sets of class averages on the graph of plotted 
data and draw a straight line through these two points with a rule. 

This method will frequently give a quite accurate or representa- 
tive regression or trend line as is indicated in Worksheet No. 34 
and Fig. 51. The class average line in Fig. 51 was not computed in 
Worksheet No. 34, Avhich gives only a straight line, but taken from 
another Avorksheet in which five class intervals were used. 


WORKSHEET NO. 34 

Computation of Regression Line Through Tavo Class 
Averages for Heights and Weights of 106 Children 


X, 41-52 

X, 

53-66 

Height X 

Weight F 

Height X 

Weight F 

41 

41 

53 

61 

43 

38 

53 

57 

43 

46 

53 

61 

44 

42 

53 

82 

52 

61 

66 

102 

Sums 2523 

2774 

3035 

4122 

Items 52 

52 

54 

54 

Means 48.52 

53.35 

56.20 

76.33 
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Weight in 
Pounds 



Height in Inches 


Fig. 51. (A) Regression line through class averages, and 

(B) least squares regression line for height and weight of 
106 school children 


Even with five class intervals the class average line in Fig. 51 is 
almost straight. It shows a slight curvilinear trend. Using only 
two class interval averages, the class average regression line would 
be straight and almost identical with the least square line. 
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Interpretation of Freehand and Class Average 
Regression Lines 

A freehand curve is largely a descriptive device. Just as the 
plotted points of the data on the coordinate graph picture to some 
extent the logical average relationships between the two variables 
so does the freehand curve through the center of those points 
further describe this relationship in a more concrete summary 
form. The only possible inferences which may logically be 
drawn from a freehand curve are that (1) to the extent to which 
the sample truly represents the population does the freehand curve 
represent the true regression in the population, provided (2) the 
freehand curve does correctly measure the actual regression in 
the sample. Error in either case will to the extent that it is 
present invalidate any conclusions drawn from the freehand curve. 
(1) If the sample does not adequately represent the population, 
the sample freehand curve cannot adequately represent the re- 
gression in the population. (2) If the freehand curve does not 
correctly measure the regression in the sample, it cannot correctly 
measure the regression in the population even though the sample 
is adequate. One or both of ‘these errors will always be present. 
It is the purpose of the statistician to keep them as small as pos- 
sible. The advantage of the freehand curve is that it is an inex- 
pensive descriptive device which can often be made to give a good 
picture of the trend in the data. 

Assumptions under Mathematical Curves 

Most mathematical curves used in statistics are based on as- 
sumptions or pre-judgments as to their appropriateness to meas- 
ure the relationship under consideration. It is possible to apply 
a dozen or even a hundred different mathematical curves to one 
sample of data. Since these mathematical curves are quite diverse 
in shape and functions, it is logical that if one of them does fit the 
data well the others cannot. Of course, none of the ones tried 
may fit. Outside of the fields of astronomy, atomic chemistry, 
quantum physics and electronics, data are usually much too com- 
plex to coincide with purely theoretical mathematical functions. 
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In the fields of biology, agriculture, education, economics, soci- 
ology, and political science no such basic theoretical mathematical 
functions have yet been discovered or developed. Whenever life 
and protoplasm form part of the data, and especially whenever 
human intellect, judgment and prejudices become part of the data, 
and still more especially whenever human institutions with all 
their social lag and traditions enter the data, the subject matter 
for measurement and analysis becomes so very complex that a 
workable, purely theoretical, mathematical attack on the problem 
becomes largely or entirely impractical and useless. 

Under such circumstances the statistician enters an experi- 
mental and empirical field in which he often has to resort to 
methods of experimentation which in their initial stages may de- 
cline to the level of ^^cut and fit^^ or “successive approximations.^^ 

An elementary statistics text is not a proper place to carry any 
conflict between purely theoretical mathematics and the exi- 
gencies of empirical research to a final solution. The most that 
is desired is to point out clearly that in most statistical work, 
many mathematical regression lines could be applied to the same 
data and that perhaps few if any of them would fit well, and that 
it is often necessary to try several before a satisfactory fit is 
found. This does not mean that in many cases a basic functional 
relationship may not be discovered which a specific mathematical 
function measures closely or even perfectly, but it does mean that 
in most cases, in the social sciences at least, such functions are not 
known and that the likelihood of their early discovery declines 
with the increasing complexity of the data. The elementary 
student in social and even in biological statistics must, therefore, 
rely largely on empirical experimental methods in deciding what 
kind of regression line to employ for any given data. For the 
simpler economic changes such as variations of business activity 
with the time of year or of growth over a period of time, basic 
mathematical functions are more easily discovered than for the 
more complex business cycle or social evolution. Mathematical 
economists have made some progress in discovering and measuring 
the more simple economic functions. Further progress in this 
field is to be expected. 


METHODS OF COMPUTING LINES 
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Assumptions under Class Average Lines 

When a statistician decides to divide his data into only two 
classes he automatically pre-determines a straight trend line, be- 
cause the two averages limit the result to a straight line. The 
postulate, ^^Two points determine a straight line,^^ is well known. 
If, however, the data are divided into three or more classes with 
paired averages for each of the several classes, the regression line 
may be a smooth curve among the averages or a series of straight 
lines from one class mean to the next. In deciding upon more 
than two classes the statistician does not pre-determine a curved 
line of any particular function, but merely makes possible the re- 
vealing of the actual conditions and relations existing in the data. 
If the relationship is actually a straight line, the use of more than 
two classes will give the straight regression. If, however, the 
actual relationship is not straight, the use of more than two 
classes will give a more or less true picture of what the true rela- 
tionship is. The use of two class averages presumes a straight- 
line regression. The use of more than two classes does not pre- 
sume any particular relationship but merely indicates what does 
exist. This point is treated more fully in the chapter on curvi- 
linear regression. 

Since it is often quite difficult to determine beforehand what 
kind or type of mathematical line to fit to the data, it is usually 
best either to fit a freehand curve or the curve through class 
averages first, and by means of this curve decide the type of 
mathematical curve required. Such a procedure will make un- 
necessary the difficult task of experimenting with several ex- 
pensive mathematical curves before the correct one is discovered 
and will secure a better fit. If all that is desired is to describe 
one set of data without any thought of setting up a generalization 
for all similar data, that is, if it is a special case, the class average 
curve is as good or better than any other for describing it. If, 
liowever, it is thought that the relationship existing in any sample 
is representative of a large group of data and is a permanent and 
universal principle having a wide application, the mathematical 
expression of this relationship in a formula is of great value. The 
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formulas of the physical and biological sciences and, to a some- 
what less extent, those of the social sciences are good examples. 
Illustrations of this point are the law of gravitation, or the formula 
of falling bodies, Boyle’s law of gasses, the laws of electricity, or 
the formula for capitalizing income as presented in economics. 

Mathematical Regression Lines 

Whenever the relationship between two or more variables can 
be adequately expressed by a mathematical regression line, it is 
preferable to any other type of line because it is more exact and 
measures the relationship in a formula which can be manipulated 
algebraically for further analysis. There are so many different 
types of mathematical lines that it is usually possible to find one 
that will express with sufficient accuracy almost any relationship. 

Least Squares Method 

The type of line most frequently used is the one fitted by the 
least squares method. The derivation of this formula requires 
the use of the calculus and higher mathematics. In an elementary 
course in statistics such as this, no effort is made to introduce the 
student to the derivation of formulas. That should be reserved 
for more advanced courses. It is fortunate that it is not necessary 
for the student of statistics to understand fully the derivation of 
formulas in order to use them effectively. The brevity and hurry 
of human life make it necessary every day for us to use much 
equipment which we as individuals could not create. Many of us 
use typewriters efficiently, but few of us could create one or even 
repair one. Most adults drive automobiles, but very few of us 
could make one, and our inability to repair them is responsible 
for the large number of garages. The same principle applies to 
fountain pens, electric lights, locks and keys, clocks, watches, 
time tables, calendars, and a thousand other items without which 
modern life would be impossible. The same principle applies to 
statistical formulas. Already you have computed easily several 
types of averages and measures of dispersion without being able 
to create the formulas used. This is not to say that a knowledge 
di the calculus and higher mathematics is not very helpful to the 
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statistician. It is, and every student who wishes to become an 
efficient statistician or go far in research work should have two 
or three years of mathematics. But even so, it is possible to use 
statistical formulas effectively without being able to create them. 

By ^4east squares is meant a regression line which is drawn 
through the paired items of the data so that the vertical distances 
between the points and the re- 
gression line when squared and 
summed make the smallest 
total possible. It is a line which 
on the average comes nearest 
to all the points or items of 
data. There is but one possible 
location for such a line. This I-ocation of least squares 

V . ... i? . X 1 1 .L regression line 

one best position of total least 

squares having been located, we have the best fitted line possible 
for any one given type of line. To move this line ever so little, 
up or down, or to turn it about on its own center point would 
increase the total squared distances of the points of data to the 
line. This relationship may be indicated by Fig. 52. 

In Fig. 52, line A is the best least squares line. It passes nearer 
all the points of data than any other line. If the regression line 
were shifted to the position of line B, some points would be closer 
to the B line than to the A line, but other points would be so 
much farther away that the total squared distances would be 
greater. The total squared distances would be even greater if 
the line were changed to the position of C or D. Not only the 
turning of the line but also the raising or lowering of the line will 
increase the sum of squares. 

LOCATION AND DESCRIPTION OF STRAIGHT LINE 
The mathematical formula for a straight line is 

Y = a + bX 

in which Y and X represent the measurements of pairs of data. 
Such a line is located in a plane as follows: 
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Meaning of symbols: 

The F-axis is the vertical line 
The X-axis is the horizontal line 

0 = point of origin, intersection of axes 
+ Y ~ measures on F-axis above zero 
— F = measures on F-axis below zero 
-1- X = measures on X-axis to right of zero 
— X = measures on X-axis to left of zero 
Abscissa = a measurement along X 
Ordinate = a measurement along F 

The upper right-hand quadrant where X and F are both plus 
(+) is the part of the graph usually used in trend analysis, but 
the other quadrants are also required. Figure 53 illustrates 
these points. 

In Fig. 53 three trend lines, MNj OP, and QR, are shown. 
Meaning of symbols: 

F = dependent series of data measured on vertical axis 
X = independent series of data measured on horizontal axis 



Fig. 53. Showing location and meaning of a and h in regression lines 
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a = F-intercept, distance between point of origin, zero, and the 
point where the regression line crosses the F-axis. It meas- 
ures how high or low the line is on the graph. Below zero it 
is (— ) ; above zero it is (+). 

h = the slope of the line, how flat or steep the line is, how many 
units it changes on F for the change of one unit on X. If 
values of F decline as one moves to the right on X, h is 
minus If values of F increase as one moves to the right 
on X, h is plus (+). 

X and F are known. They are the paired items of the data. 

a and h are the unknowns. Since there are two of them, they 
must be obtained by the solution of two simultaneous equations. 

The worksheet and formulas for computing such a line are 
relatively simple and need give the student with a limited knowl- 
edge of algebra little difficulty. The two necessary equations are 
formed as follows: 

Taking the basic equation, F = a + hX, for each pair of items 
in the data, the relationships are expressed in Column 1. 


Column 1 


r = a + hX' 
F'' = a -I- hX'^ 
Y"' = a 4- hX" 


Equations in Symbols 

Column 2 
(Equations in Col- 
umn 1 multiplied by 
coefficient of a) 

IF' = la -f- blX' 

IF" = la + blX" 

IF'" = la + 61X'" 


Column 3 

(Equations in Column 1 
multiplied by coefficient 
of h) 

X'F' = aX + 6X2 
X"F" = aX -i- hX^ 
X'"F'" - aX + 6X2 


Fft = a -f- 6Xn IFn = la -f- 61Xn 


= KTn J- hT.Y 


'XnYn == aX + 6X2 




In Column 2 we have each equation in Column 1 multiplied 
through by the coefficient of "a" in Column 1. This coefficient 
is unity and, therefore, does not change the size of the equation. 
In Column 3 we have each equation in Column 1 multiplied 
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through by the coefficient of which is X. The two equations 
derived by this method are called The Normal Equations, and are: 


Formula No. 30 

XY ^NaAhXX 
XXY ^ aXX AhlX^ 


The solution of these two equations gives us values for a 

for 6. 

Equations in Data 


No. 1 

No. 2 

No. 3 

F = a + 6X 

(No. 1 multiplied by 

(No. 1 multiplied by 

coefficient of a) 

coefficient of 6) 

7 = ct “b 65 

7 = la 4- 56 

35 = 5a + 256 

5 = a -f 64 

5 = la + 46 

20 = 4a + 166 

10 = a + 69 

10 = la -f 96 

90 = 9a + 816 

3 = <x 68 

8 = la H” 86 

64 = 8a + 646 

4 = a ”1“ 62 

4 = la + 26 

8 = 2a 4* 46 

6 = a + 63 

6 = la 4" 36 

18 = 3a 4" 96 

9 = a + 67 

9 = la 4- 76 

63 = 7a 4- 496 


49 = 7a + 386 

298 = 38a 4- 2486 


Solution of Simultaneous Equations 

298 = 38a -f 248& (Divide equation by 38) 

49 = 7a -f 385 (Divide equation by 7) 

Divide each equation through by coefficient of a as follows: 

No. 1 7.842 = a + 6.5266 Subtract No. 2 from No. 1 
No. 2 7.000 = g + 5.4286 

.842 = 1.0986 .842 ^ 

1.098 ^ ^ 

Substituting the numerical value of 6 in equation No. 2, 
a = 7.0 - 5.4286 = 7.0 - 4.163 = 2.837 
For these data the equation, F = a + hX, becomes 
F = 2.837 +.767X. 
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At the point on the X-axis where X = 0, the associated average 
value of F = a, or 2.837. There is a specific F-value for every 
specific value of X. The value for 6, .767, indicates that for every 
change of one unit in X there is a corresponding change of .767 
of one unit in F. 

Significance of Sign of b 

The algebraic sign of 6 in the equation, F = a + Z>X, is the indi- 
cator of the direction of the slope of the regression line. If b is 
plus (+) the relationship between X and F is positive and values 
for F increase in the ratio of 6 : 1 as values of X increase. On 
the graph the trend line rises to the right. If, however, the sign 
of b is minus (— ) there is a negative or inverse relationship be- 
tween X and F, and values of F decrease as values of X increase. 
The trend line declines as the X values increase. 

The graph reveals that the actual values of the data do not fall 
exactly on the regression line. Since all values for F estimated 
from the equation F = a + bX do fall exactly on the trend line, it 
is evident that there is a discrepancy or variation between the 
data and the computed average values. It is very important in 
statistics that this error be measured. 


STANDARD ERROR OF ESTIMATE 

Since one of the greatest values of the regression equation is its 
use in estimating average values for F from specific values of X, 
and since these two quantities are usually not identical, the 
standard error of estimate was devised to measure this error. The 
standard error of estimate is the square root of the mean of the squared 
deviations between the actual values of Y and the estimated values of F. 

Symbols: Sy = standard error of estimate 

F' = estimated values of F 
z^{Y- F') 


Foemula No, 31 Formula No. 32 






'S(r - Y'y 


N 





or 
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The detailed worksheet for the computation of Sy from indi- 
vidual items of data is as follows: 

WORKSHEET NO. 35 

Computation of Standard Error of Estimate 

FOR Individual Items of Data 

X 

Y 

a+ =Y' 

® ^ (.767X) ^ 

(F - 70 
z 


5 

7 

2.837 + 3.835 - 6.672 

■+• .328 

.107584 

4 

5 

2.837 + 3.068 = 5.905 

- .906 

.820836 

9 

10 

2.837 -f 6.903 = 9.740 

+ .260 

.067600 

8 

8 

2.837 + 6.136 = 8.973 

- .973 

.946729 

2 

4 

2.837 + 1.534 = 4.371 

- .371 

.137641 

3 

6 

2.837 + 2.301 = 5.138 

.862 

.743044 

7 

9 

2.837 + 5.369 = 8.206 

+ .794 

.630436 




- .005 

3.453870 




The standard error of estimate occupies the same relationship 
to a regression line that a standard deviation does to a mean. 
ay measures the scatter of items around a mean. Sy measures 
the scatter of the Y-data around the regression line. The Sy is 
always measured on the F-axis vertically from the regression 
line. One + and — from the regression line includes approxi- 
mately .6827 of the items of data in a large sample^ or about two- 
thirds of the data. The detailed method given in Worksheet 
No. 35 is presented not as a model to be followed in working 
problems of any considerable size but as a teaching device to hold 
all the relationships involved, as it were, under a methodological 
microscope so that the student could see every detail of the rela- 
tionships and processes involved. For actual statistical work 
much shorter methods are available, but they tend to obscure the 
details of the processes for the beginning student. Formulas 
Nos, 37 and 38 supply a better method to follow in computing 
Sy for large samples. They can be worked from the deviations 
appearing in the correlation and regression worksheet. 
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By using Worksheet No. 36 it is possible to reduce the volume 
of work required very materially. The first step is to change 
the form of the normal equations. These are transformed from 

(1) 2F =iVa + 62X 

(2) SZF = aZX + 62X2 

in which only the total of the original data can be used, to the de- 
rived equations, 


Formula No. 33 


Formula No. 35 


( 1 ) 


2a;2 


or 


6 




2 ^ 


Formula No. 34 Formula No. 36 

(2) a=Y-bX or a = X - bY 


(1) Equation (1) of the normal equations is changed to equation (2) 
of the derived equations as follows: 

2Y = NY, 2X = NX, 

therefore SF - Na-Y 6SX equals NY — Na + bNX 
dividing by N the equation becomes 

F = a + 6X, or a = F - 6X 

(2) Equations (1) and (2) of the normal equations are changed to equa- 
tion (1) of the derived equations as follows: 

(1) Multiply normal equation (1) through by X, 

2FX = aNX -h 62XX 

(2) Change to NYX = aNX + hNX^ 

(3) Subtract from normal equation (2) 

SXF = aNX + 6SX 
NXY = aNX + bNX^ 

2XF - NXY = 6SX2 - bNX^ 
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Collect terms, 



SZF - NXY = 6(SX2 - NX^) 

j SXF - NXY 

SZ2 - NX^ 

but 

(X - X) = X 

square 

(X - X)" = *2 = X^ - 2XX + X‘‘ = x^ 

sum 

SX^ - 22XX + XX2 = 2x2 

equals 

2X2 _ 2J\rX2 + XX2 = 2x2 

equals 

2X2 _ jv-X2 = 2x2 

and 

(X - X)(F -Y) = xy 

equals 

XY - XY - XY + XY = xy 

sum 

2XF - 2XF - X2F + NXY - 2x2/ 

and 

2XF - NXY - NXY + NXY = 'Zxy 
2XF - NXY = 2x2/ 

Therefore 

j 2XF - NXY 2x2/ 

2X2 - XX2 2x2 


b = 


Sa;y 


/j'T.'lt 

In the formula ? in which h is written byx, the yx are 


subscripts of b and indicate that this particular b measures the 
relationship between X and Y when Y is the dependent variable 
on the vertical axis and X is the independent variable on the 
horizontal axis. 


In the formula b^y = ^ xy are subscripts of b and in- 

dicate that this particular b measures the relationship between 
X and F when X is the dependent variable on the vertical axis 
and F is the independent variable on the horizontal axis. These 
two formulas for the same data mil not give identical regression 
lines. Ordinarily only byx [read h (y on a;)] is computed. In the 
equation F = a + bXj the F-values are estimated from the X- 
values, but in the equation X = a + bY, the X- values are esti- 
mated from the F-values. 
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Method of Computing Deviations in Standard 
Worksheet No. 36 

(1) The two paired variables of the original data are laid 
down in the Columns X and Y. Their squares and products are 
computed. 

(2) The five columns are summed. 

(3) Means are computed for X and Y. 

(4) The corrections are computed as follows: 

(a) HfX times X = corrections for XX^ 

(b) 2 Y times Y = correctioni^for 2 Y^ 

(c) XX times F, or2F times X, equals corrections for 2XF. 

(5) The deviations from the means are obtained by subtracting 
the corrections from the sums immediately above them. 

Xxv 

Using the equations in terms of deviations, == and 


Xx^ 


a = Y — bXj the results for Worksheet No. 36 are: 


WORKSHEET NO. 36 

Computation of Least Squaees Regeession Line Between 
Heights and Weights of 106 School Children 


Child 

No. 

Height * 

X 

Weight t 

F 

X2 

p 

XY 

1 

53 

61 

2809 

3721 

3233 

2 

46 

40 

2116 

1600 

1840 

3 

51 

49 

2601 

2401 

2499 

106 

58 

76 

3364 

5776 

4408 

Sums 

5558 

6896 

293,595 

475,714 

368,650 

Means 

52.433 

65.056 




Corrections 



291,423 

448,626 

361,581 

Deviations 



2,172 

27,088 

7,069 


* Inches. 


t Pounds. 
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7,069 

2,172 


3.2546 


a = F - 6X = 65.056 - 3.2546 X 52.433 

= 65.056 - 170.648 
= -- 105.6 

7-a + 6X is - 105.6 + 3.2546Z 


From this equation it is possible to compute the expected or 
average estimated weight of a child, if his height is known. Since 
all such estimates are made from the average relationship in the 
sample there will be an error or deviation between the actual 
weight and the estimated weight in most cases. To measure this 
error or deviation the computation of the standard error of es- 
timate is necessary. In any case in which the regression line 
passes through all the items of data, the standard error of estimate 
will be zero (0). 


Short Method for Computing Standard Error of Estimate 

It would involve too much cost of time and money to compute 
ordinarily Sy by the long method used in Worksheet No. 35. 
Fortunately a much shorter method is available. 

Formula No. 37 Formula No. 38^ 



-S.= V or Sy = 


^ r - r = z, (Y- by^X) = ^ x) = 0, by shifting the point 

of origin from the intersections of the X and F-axes to the intersection of 
the means of X and Y, the estimated Y becomes y — bx in terms of devia- 
tions from the means. 


Then 


(y - bxY 
(y^ — 2hxy -f- 

2 _ o 5^ ^ 1 ^ 

^ 1 ■ 1 


= 22 


= 22 


= 22 


y, ,3 _ - Xxy (^xyY 

^ Sa:2 + 2*2 


Summed 


= 2^2 
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For the weight of the school children Sy is 6.2 pounds. 

s. = V or S, 


s/ 

v* 

v* 


N 


(7,069)" 

2,172 


106 


'27,088 - 23,066.8 


106 


106 
6.2 pounds. 


= V38.5018 


V - V N 

. /27,088 - 3.2546 X 7,069 

V 106 




27,088 - 23,006.8 


106 


/4081.2 


V 106 

= 6.2 pounds, 


== V38.5018 


The standard error of estimate is expressed in terms of the units 
of the original data of the F, or dependent variable. 

The dependability of Sy, like all other statistics, rests on the 
adequacy of the sample from which it is computed. If the sample 
is too small it is subject to a large degree of variability and un- 
certainty. 


and 


Since 


and 

^ Sx2 

-2^2 

but 

5/ 

li 

Then 


2?/2 - 


Sy^ = 
Sy = 


2x2 


% — \l' 


N 


(ZxyY 

2x2 

T 

N 






hyx(Zxy) 


N 
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WORKSHEET NO. 37 

Computation op Regression Between Circulation of 6 Women^s 
Magazines (Z) and per Capita Total Retail Sales (F) 

BT Counties for Maryland, 1939 * 

Magazine 
Circulation 
per 1000 
Pop. 

Z 

Per Capita 
Sales 

Y 


72 

XY 

159 

279 

25,281 

77,841 

44,361 

114 

184 

12,996 

33,856 

20,976 

67 

137 

4,489 

18,769 

9,179 

79 

126 

6,241 

15,876 

9,954 

112 

213 

12,544 

45,369 

23,856 

124 

184 

15,376 

33,856 

22,186 

129 

181 

16,641 

32,761 

23,349 

58 

133 

3,364 

17,689 

7,714 

85 

161 

7,225 

25,921 

13,685 

127 

228 

16,129 

51,984 

28,956 

64 

129 

4,096 

16,641 

8,256 

131 

182 

17,161 

33,124 

23,942 

75 

142 

5,625 

20,164 

10,650 

116 

199 

13,456 

39,601 

23,084 

141 

268 

19,881 

71,824 

37,788 

133 

189 

17,689 

35,721 

25,137 

76 

161 

5,776 

25,921 

12,236 

48 

105 

2,304 

11,025 

5,040 

68 

102 

4,624 

10,404 

6,936 

127 

235 

16,129 

55,225 

29,845 

150 

259 

22,500 

67,081 

38,850 

136 

232 

18,496 

53,824 

31,552 

114 

216 

12,996 

46,656 

24,624 

Sums 2433 

4245 

281,019 

841,133 

482,786 

Means 105.782 

184.665 




Corrections 


257,368 

783,478 

449,047 

Deviations 


23,651 

57,655 

33,739 


Source: Census of Retail Trade, 1939 
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33,739 

23,651 


= 1.4265 


a = Y -by,X 184.565 - 1.4265 X 105.782 
= 184.565 - 150.898 
== 33.667 

y = a + = 33.667 + 1.4265X 


a = X — hxyY == 


33,739 

57,655 


= .5852 


105.782 - .5852 X 184.565 
105.782 -- 108.007 
- 2.225 


X = a + 57 = ~ 2.225 + .58527 




Zy'^ - (byx • Zxy) 
N 


57,655 - (1.4265 X 33,739) 
23 


57,655 - 48,129 
23 


^ 9,524 
23 

= 414.3 

Sy = ^414.3 = $20.35 of per capita sales. 


On the assumption that these statistics for the counties of 
Maryland would hold for other years, it would be possible to es- 
timate the probable per capita retail sales for any county for 
which we had figures on the circulation of the six women’s maga- 
zines. If the magazine circulation for a county was 100 per 
thousand population, the estimate would be : 

7 = 33.67 + 1.4265 X 100 

= 33.67 + 142.65 = 176.32, or $176.32 per capita sales. 

The accuracy of this estimate on the basis of oneSy db would be: 

176.32 ± 20.35, or $156.00 to $196.67. 
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The chances are 68 out of 100 that if magazine circulation is 
100 per 1000 population, per capita total retail sales will fall be- 
tween $156.00 and $196.67. Although this is not a highly ac- 
curate estimate, it is of sufficient accuracy to be of large value in 
prorating sales by counties for Maryland by a national distrib- 
uting or advertising concern. 


Per Capita 
Sales 
($ 1 , 000 ) 



Fig. 54. Magazine circulation (per 1,000 population). Least 
squares relationship between magazine circulation per 1,000 
population and per capita sales by counties in Maryland, 1939 


For the magazine circulation data for Maryland, 7 of the 23 
items fall outside of one Sy measured plus and minus from the 
regression line. This is a small sample of only 23 items. In a 
larger sample the percentage of items within one + and — Sy would 
tend to be nearer the normal distribution. 
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WORKSHEET NO. 38 


Computation of Least Squares Straight Line Regression 
Between Cost of Wheat per Bushel and Yield 

PER Acre for 25 North Dakota Farms 

X* 

Et 

xr 

X2 

72 

9 

184 

1656 

81 

33,856 

8 

200 

1600 

64 

40,000 

10 

156 

1560 

100 

24,336 

11 

129 

1419 

121 

16,641 

33 

70 

2310 

1089 

4,900 

20 

67 

1340 

400 

4,489 

34 

64 

2176 

1156 

4,096 

31 

67 

2077 

961 

4,489 

14 

93 

1302 

196 

8,649 

18 

75 

1350 

324 

5,625 

15 

98 

1470 

225 

9,604 

21 

73 

1533 

441 

5,329 

16 

86 

1376 

256 

7,396 

12 

108 

1296 

144 

11,664 

14 

89 

1246 

196 

7,921 

19 

67 

1273 

361 

4,489 

27 

58 

1566 

729 

3,364 

29 

60 

1740 

841 

3,600 

32 

62 

1984 

1024 

3,844 

17 

86 

1462 

289 

7,396 

24 

50 

1200 

576 

2,500 

22 

54 

1188 

484 

2,916 

26 

48 

1248 

676 

2,304 

28 

49 

1372 

784 

2,401 

30 

60 

1800 

900 

3,600 

Sums 520 

2,153 

38,544 

12,418 

225,409 

Means 20.8 

86.12 




Corrections 


44,782 

10,816 

185,416.4 

Deviations 


- 6,238 

1,602 

39,992.6 


* Yields in bushels per acre, 
t Costs in cents per bushel. 
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F = a + &X 
, Jixy — 6,238 
“ 1602 


- 3.894 


a=Y-bX = 86.12 - (- 3.894) (20.8) 
= 86.12 + 81.00 = 167.12 
Y = 167.12 - 3.894X 


S, 




^ 2/2 — hy^ - l^xy 


N 


39,992.4 ~ (- 3.894) (- 6,238) 


25 


^ ^ 39,992.4 - 24,290.8 ^ ^15^70L6 


25 


V626To 6 = 25.02 cents. 


A sufficient number of statistical methods have now been pre- 
sented to make necessary a preliminary discussion of the three 
principal uses of statistics. Briefly, these may be listed as (1) 
description, (2) analysis, and (3) inference or estimation. 

Description is the simplest use which can be made of statistics. 
Any sample or quantity of data may be thrown into an array and 
frequency distribution and have various averages and measures 
of dispersion computed and be depicted in charts and graphs 
with no further purpose than to give an objective mathematical 
and graphic description of these particular data. No further 
attempt may be made to analyze or point out further relation- 
ships which may be revealed or to search out and explain the inner 
and hitherto hidden meanings of the data. No effort may be 
made to use the data or this description of them to estimate, 
forecast, or project inferences as to other related data or to dis- 
cover general principles or laws. Description of a bit of data is 
all that is sought and this having been accomplished no further 
labors are performed. 

Description, however, is usually only a means to an end and is 
ordinarily done to bring to light obscured relationships which it is 
desired to analyze and explain. Analysis and explanation are 
always the legitimate ends of description. Usually a sample of 




MODEL LEAST SQUARES WORKSHEET 259 


wheat yields is taken, described, and measured only to reveal 
relationships among wheat yields and among them and other 
variables such as rainfall, time of planting, kind of seed, condition 
of seed bed, types of winter weather, prevalence of pests, and cost 
in order to analyze complex farm management problems. An 
insight into relationships and an insight into hidden factors are 
the end of the study. 

But analysis is usually not an end in itself. One analyzes in 
order that he may comprehend and forecast, infer, project, and 
apply his information for further practical or theoretical use. 
The great end of statistical description and analysis is inference. 
To make logical and accurate inferences on the basis of statis- 
tical analysis is a difficult procedure which requires both (1) a 
full and accurate knowledge of the subject matter studied and 
(2) a sufficient knowledge of mathematical methods, logical anal- 
ysis, and wholesome caution. Regression is a basis for estimating. 
But since it rests on the average relationships among paired values 
of data, the smallness of the relationship or the smallness of the 
sample or both will usually bring an error into the computation 
which will limit the force of the inference drawn from the re- 
gression equation. Most situations are so complex, especially in 
the biological and social sciences, that no single independent 
variable accounts for all the change in any specific dependent 
variable. This fact means that there will be a considerable error 
in any estimate made from any regression equation which must 
be taken into consideration in any inference that is drawn. 

The beginning student should be especially careful not to in- 
clude more in his inference than is warranted by the careful anal- 
ysis of the data. Any estimate of the w^eight of a child from his 
height will likely contain an error because the weight of children 
is also influenced by health, heredity, age, and adequate food. 
Per capita sales do not depend at all on magazine circulation, but 
both may depend to some degree on the common factor of pur- 
chasing power. If persons have more wealth they subscribe to 
more magazines and at the same time make more purchases of 
most other goods. Those with little money may buy few or no 
magazines and only small quantities of other merchandise. 
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SUMMARY 

1. Regression is the measure of the average relationship between two 
variables in terms of the units of the original data. Linear regression is 
a straight line relationship. 

2. The regression is positive when the number of units of the dependent 
variable increase with an increase in the number of units of the independ- 
ent variable. The values on Y increase as the values on X increase. 

3. The regression is negative when the number of units of the de- 
pendent variable decrease with an increase of the independent variable. 
The values on Y decrease as the values on X increase. 

4. This relation may be described and measured by a freehand line 
drawn through the plotted points of data on a coordinate scale. 

5. This relation may be described and measured by a straight line 
drawn through the class averages of the data divided into two approxi- 
mately equal parts. 

6. This relationship may be described and measured by a least squares 
straight line of the form Y — a + hX, 

7. The estimated values of the dependent variable may be computed 
from this equation for all values of the independent variable. 

8. The standard error of estimate is the square root of the averaged 
squared deviations of the data from the regression line. This measure 
is to a regression line what a standard deviation is to an arithmetic mean. 
One Sy, plus and minus, includes in a normal distribution about 68% 
of the items of the sample. 

9. Regression lines and their standard errors of estimates are effective 
means for discovering scientific relationships and for expressing these 
relationships in accurate mathematical terms for scientific summary and 
inference. 
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REVIEW QUESTIONS 

1. What does the term regressions^ as used in statistics mean? 

2. Define “ science and show how regression is related to scientific 
knowledge and discoveries. 

3. Explain the four steps required to make a “freehand curve.ss 

4. What are the advantages and disadvantages of a freehand curve? 

5. Explain the six steps required to make a regression line through 
class averages. 

6. Compare the regression line through class averages with the free- 
hand curve as to method and reliability. 

7. Can the standard error of estimate and the coefficient of correlation 
be computed for either or both of these lines and if so, by what methods 
and formulas? 

8. What is a “least squares’^ regression line? 

9. What do “a’^ and “b^^ signify in the equation, F = a + bX? 
Explain fully. 

10. How are the “normal equations’^ obtained? 

11. What is the difference between the long and short methods of 
computing a least squares regression line? 

12. Show the complete algebraic transformation of the two normal 
equations 


(SF = Va + 6ZZ) 
(SZF = aSZ -f b2X^) 


into the equation b — 


2x2 


13. Show that (2F = Va -|- 6Z) = (a = F - bX). 

14. What is the standard error of estimate? How may it be com- 
puted? Why is it important in connection with regression lines? Ex- 
plain fully. 

15. What does extrapolation mean? To what extent is it dependable? 

16. In what units is the standard error of estimate measured? Why? 

17. What changes occur in the standard error of estimate as the “fit” 
of the regression improves? Why? 

18. What is the difference between the equation, F = a ■+- byxX and 
the equation, X = a + 6^2/ F? Explain fully. 

19. Explain why no regression line is complete without its standard 
error of estimate. 

20. What advantages are there in a mathematical regression line over 
the free-hand type? 
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EXEKCISES 


1. Yield in bushels of wheat per acre on (1) fertilized soil and (2) un- 
fertilized soil at Oklahoma A. and M. College, 1899-1940 inclusive. By 
Horace J. Harper, Head, Soils Department. 


Year 

Ferti- 

lized 

Unferti- 

lized 

Year 

Ferti- 

lized 

Unferti- 

lized 

Year 

Ferti- 

lized 

Unferti- 

lized 

1899 

1 30.6 

12.0 

1913 

14.8 

5.6 

1927 

4.7 

1.4 

1900 

36.8 

18.1 

1914 

33.5 

23.3 

1928 

26.2 

15.8 

1901 

37.7 

28.0 

1915 

19.5 

15.2 

1929 

16.4 

9.1 

1902 

17.4 

15.3 

1916 

13.3 

7.9 

1930 

19.1 

1 7.9 

1903 

27.6 

20.3 

1917 

32.0 

21.0 

1931 

25.0 

' 25.6 

1904 

15.7 

12.6 

1918 

29.2 

10.7 

1932 

i 30.2 

1 19.3 

1905 

11.7 

4.7 

1919 i 

11.6 i 

7.0 

1933 

28.0 

12.5 

1906 

23.3 

7.1 

1920 

34.0 

27.3 

1934 

12.7 

12.7 

1907 

14.9 

5.2 

1921 

15.7 

7.3 

1935 

27.7 

14.0 

1908 

15.5 

12.9 

1922 

7.4 

3.8 

1936 

21.8 

19.3 

1909 

25.4 

21.7 

1923 

23.5 

12.9 

1937 

28.3 

i 22.0 

1910 

35.2 

18.7 

1924 

17.7 

7.7 

1938 I 

10.2 

3.4 

1911 

4.9 

2.3 

1925 

20.7 

11.4 

1939 ! 

25.2 

15.3 

1912 

20.4 

5.3 

1926 

7.0 

7.1 

1940 i 

28.2 

15.2 


2. Population and Cost of Protection of Property of XT. S. Cities of 
400,000 to 100,000 population. ‘^Financial Statistics of Cities Having a 
Population of Over 30,000, 1922,'^ Bureau of Census, (in l,000^s) 


Population 

Cost of Protection 
of Property 

Population 

Cost of Protection 
of Property 

400 

$2,079 

240 

$1,778 

339 

2,309 

237 

1,254 

335 

2,515 

230 

1,432 

316 

2,591 

218 

1,023 

312 

2,175 

208 

724 

306 

3,547 

201 

1,209 

269 

1,750 

190 

822 

268 

1,457 

188 

1,356 

255 

1,260 

181 

1,205 

241 

1,865 

179 

1,056 
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Population 

Cost of Protection 
of Property 

Population 

Cost of Protection 
of Property 

174 

763 

132 

644 

170 

1,315 

127 

982 

168 

914 

125 

1,081 

168 

1,048 

122 

802 

162 

>35 

120 

612 

151 

1,342 

114 

585 

150 

791 

114 

505 

145 i 

1,058 

111 

780 

142 

792 

no 

524 

140 

698 

105 

805 

140 

1,212 

104 

550 

139 

883 

102 

612 

134 

812 

100 

569 


3. Compute regression and standard error of estimate for data on 106 
Stillwater, Oklahoma, grade school children in Worksheet No. 1, Chap- 
ter 2. 




CHAPTER 12 


CORRELATION AND DE- 
TERMINATION 


Statistical methods are devices for analyzing the relationships 
which exist within and among groups of related numbers. The 
more complete and varied our analysis is, the better we shall 
understand the meaning of the numbers. Up to this time we 
have learned to measure separate frequency distributions as to 
central tendency and dispersion and also the relationship between 
two related frequency distributions by means of regression lines. 
All of these measures are in terms of the units of the original data. 
If our original data are bushels of wheat, all our measures, aver- 
ages, means, medians, modes, dispersions, standard deviation, re- 
gression, and standard error of estimate — all will be so many 
bushels of wheat. If our original data are in tons, ail our statistics 
so far calculated will be in tons. If our original data are in dollars, 
all the statistics we have learned to compute up to this time will 
also be in dollars, except the coefficient of variation. 

There is, however, need for another type of measurement, a 
ratio, or relative number, or percentage statement of the rela- 
tionship betweeen the variables. What percent of the weight of 
an individual, on the average, can be accounted for by his height? 
What percent of the cost of a bushel of wheat can be accounted 
for by the yield per acre? To what degree is variation in the 
two variables associated? 

This measure is not expressed in terms of the units of the 
original data, but is an abstract number based on 1, or 100, as 
unity. It is ordinarily called the coefficient of correlation. Cor- 
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relation is the square root 'of a percentage, and is, therefore, 
quite misleading to the beginner in statistics. 

A more easily understood, and in many respects a better meas- 
ure, is the coefficient of determination^ which is a true percentage 
of the portion of one variable that is associated with another. 
Determination is the square of correlation, and is coming into 
general use as the more accurate and easily understood measure. 
Both will be explained and computed in this chapter. Correla- 
tion will be presented first because it is the older of the two. 

The relationship between determination and correlation may 
be illustrated as follows: 


Coefficient of 

Coefficient of 

Correlation 

Determination 


r^ = % 

LOO 

1.00 

.90 

.81 

.80 

.64 

.70 

.49 

.60 

.36 

.50 

.25 

.40 

.16 

.30 

.09 

.20 

.04 

.10 

.01 


Determination is a true percentage and as such is easily under- 
stood and correctly used. For instance, one might think that 
r = .3 is one-half of r = .6, but it is in fact but one-fourth. One 
might think that r = .2 is one-fourth of r = .8, but actually it is 
only one-sixteenth of .8. 

Correlation is a measure of the amount of variation of one 
variable that is associated with or accounted for by variation in 
another variable or variables. 

In the temperate and frigid zones one observes that the direct- 
ness and intensity of the sun^s rays vary with the change in sea- 
sons of the year. The temperature of the earth^s surface and the 
atmosphere also vary. Many factors may affect the temperature 
locally such as elevation above the sea, nearness to water, direc- 
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tion of winds, and the protection of mountains, but in spite of all 
these, on the average, the two variables of earth temperatures and 
directness of solar rays vary together. The more direct the rays 
the greater is the heat. The less direct the rays the less intense 
is the heat. These quantities rise and fall together. Correlation, 
or determination, is a method of measuring the similarity of the 
change in these two variables. 

Another case of correlation is the similarity of the variations 
of the amount of rainfall and plant growth. If the rainfall is less 
than five inches a year, the earth^s surface is almost bare. As 
rainfall increases, vegetation increases until dense growth or 
jungles appear when the precipitation is over 80 or 100 inches a 
year. Other factors are essential to vegetable growth besides 
moisture, such as temperature, soil fertility, and cultivation, but 
on the average, variation in rainfall and variation in plant growth 
do move closely together. The exact relationship between quan- 
tity of water and rate of plant growth has been worked out for 
certain crops in irrigated areas. Correlation and determination 
are measures of the degree of association in the movements of 
two or more variables. 

The changes in the variation of two variables may be in the 
same direction, as in the two illustrations above, or they may be 
in the opposite direction, as in the case of the amount of cotton 
produced and its price. Plant growth increases as rainfall in- 
creases, but the price of a commodity tends to fall as the supply 
increases. When the changes in the variation of two or more 
variables move in the same direction, the correlation is said to be 
positive. When the changes in variation move in opposite direc- 
tions, the correlation is said to be negative. Correlation may 
range from perfect positive correlation, + 1, through zero (0), or 
no correlation, to — 1, or perfect negative correlation. 

Figure 55 shows the simplest explanation of correlation, the 
geometric or graphic presentation. The closer the plotted points 
of data come to the regression line, the higher is the correlation. 
The more widely they are scattered, the lower the correlation. 

The first step in analyzing the correlation of data is the con- 
struction of a scatter diagram of the paired items, as shown above. 
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Positive Little or no Negative 

Correlation Correlation Correlation 



Fig. 55. Indicating the variation in correlation from + 1 
through 0 to — 1 

Although the scatter diagram does not give a mathematical 
measure of the relationship, it does indicate quite clearly whether 
any significant correlation does exist between the variables. 

In Fig. 55A, only low values of X are associated with low values 
of y, and only high values of X are associated with high values 
of y. In Fig. 55C, the reverse is true. Small X values accompany 
only large Y values, and large X values have only small Y values. 
In Fig. 55B, however, any value of X, large or small, is associated 
with any Y value, large or small. In such a case there is little 
or no correlation. If the correlation is perfect, the plotted points 
of the data all fall exactly on the regression line. Since all estimated 
values of the data, the values computed from the regression equa- 
tion, y = a + hX, or whatever it may be, always fall exactly on 
the regression line, it follows that when correlation is perfect, the 
actual data and the estimated values are identical. In such cases 
both fall exactly on the regression line. If the correlation is less 
than perfect, the plotted points will be scattered around the re- 
gression line. The smaller the correlation is, the wider is the 
scatter of dots, or data. 

BEGRESSION AND CORRELATION 

For any specific data there is a close relationship between the 
regression line and the coefficient of correlation. The latter 
definitely depends upon the former. If the regression line does 
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not fit the data well, it is impossible to obtain the full correlation. 
If a straight line is fitted to data that are truly curvilinear, only 
a portion of the full correlation actually existing in the sample 
can be obtained. One can easily see from Fig. 56 that a curved 


Cents 



Fig. 56. Comparison of (A) free-hand curve through class aver- 
ages, and (B) least squares straight line for (y) cost per bushel and 
(a;) yield per acre of wheat 

line would fit the data on cost of wheat per bushel and yield per 
acre much better than a straight line. For the straight line rela- 
tionship the determination, is 57.2%. For the parabola the 
determination is 89.2%. Since the simple parabola is not a per- 



REGRESSION AND CORRELATION 


269 


feet regression fit for these data, it follows that the figure S9.2% 
is not all the determination actually in the sample. The freehand 
curve computed in Worksheet No. 107 and shown in Fig. 56 is a 
much better fit and gives a determination of about 98%. The 
student, therefore, should always keep clearly in mind the fact 
that a full and adequate measure of correlation is dependent on a 
full and adequate measurement of the regression. If you fail to get 
a correct regression line, you cannot get a full measure of correla- 
tion. 

The relationship between regression and correlation is so close 
that some authors combine the two in a single treatment as though 
they were the same thing. Their separate treatment is justified 
by the fact that regression is measured in terms of the original 
units of the data, while correlation is expressed as a ratio or 
abstract number. 

A simple problem will best illustrate this idea. 

Meaning of symbols : 

F' = estimated values of F, obtained from regression equa- 
tion, F' == a -f etc. 
erf = standard deviation of estimated y'^s. 
ay = standard deviation of original F’s, the F-values of the 
original data. 




Fig. 57. Showing location of ac- 
tual and estimated values as iden- 
tical. Correlation is 1. 


Fig. 58. Showing location of 
actual and estimated values not 
identical. Correlation less than 1. 
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Perfect Correlation 


Less than Perfect Correlation 


Regression for Fig. 57 



Regression for 

Fig. 58 



Z 

Y XY 



X 

7 

XY 

Z2 


4 

3 12 

16 


4 

1 

4 

16 


6 

4.5 27 

36 


6 

6.5 

39 

36 


8 

6 48 

64 


8 

6 

48 

64 


10 

7.5 75 

100 


10 

9.5 

95 

100 


12 

9 108 

144 


12 

7 

84 

144 

s 

40 

30 270 

360 

s ■ 

40 

30 

270 

360 

Means 

8 

6 


Means 8 

6 



Corrections 

240 

320 

Corrections 


240 

320 

Deviations 

30 

40 

Deviations 


30 

40 


b = 

^ _ 30 _ 


6 = ^ 

30 

= .75 




40 




40 




a = 

Y -bX 



a=Y ~ 

• bX 




z= 

6 - .75 • 8 



= 6 - 

.75 • 8 




= 

6-6 



- 6 - 

6 




= 

0 



= 0 





Y = 

a + bX 



7 = a + 

hX 




7 = 

0 + .75Z 



7 = 0 + 

.75Z 



Standard Deviation of Y for Fig. 57 

Standard Deviation 

of 7 for Fig 

;. 58 

Y 

~ 7 

= y 


7 

- 7 


y 


3.0 

- 6 

= - 3 

9 

1.0 

- 6 

= — 

6.0 25.0 


4.5 

6 

= - 1.5 

2.25 

6.5 

- 6 

= + 

0.5 .25 

6.0 

- 6 

= 0 


6.0 

- 6 


0 


7.5 

- 6 

= + 1.5 

2.25 

9,5 

- 6 

= + 

3.5 12.25 

9.0 

6 

= -f 3 

9 

7.0 

- 6 

= + 

1.0 1.0 




22.50 




38.50 


- 2 _ 

22.1 

5 


22/^ 

38.50 


N -1 4 AT - 1 

= 5.626 = 9.625 

(Tj, = V6.625 <xy = V'9.625 

In the above example the two regression lines are 

4 

identical, but 


the deviations of the data from those lines are not identical in the 


two cases. The wider the data scatters about the line, the smaller 
the correlation. 
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Standard Error of Estinaate Standard Error of Estimate 

on Y for Eig. 57 on Y for Eig. 58 


X 

F - 

Y' = 

z 

^2 

X 

Y -Y' 

= z 

^2 

4 

3.0 - 

3.0 

0 

0 

4 

1.0 - 3.0 

- 2 

4 

6 

4.5 - 

4.5 

0 

0 

6 

6.5 - 4.5 

+ 2 

4 

8 

6.0 - 

6.0 

0 

0 

8 

6.0 - 6.0 

0 

0 

10 

7.5 - 

7.5 

0 

0 

10 

9.5 - 7.5 

+ 2 

4 

12 

9.0 - 

9.0 

0 

0 

12 

7.0 - 9.0 

-2 

4 





0 




16 


Q 2 — 



0 

Of 2 


16 



Oy — 

■ N - 

1 "" 

4 


N - 1 “ 

4 



== 0 =4 


In Fig. 57 all the actual data and all the estimated values fall 
on the regression line F = 0 + .75X and Sy = 0 . In this case 
the standard deviation of the F-values is V5.625. Since the es- 
timated values of F, or F', are identical with t he actual values of 
F, the standard deviation of F' is also V 5.625. From this logical 
relationship the standard, or basic, formulas for determination 
and correlation may be derived as follows: 


Determination 


Correlation 


y2 = 



r=^ 

<^v 


5.625 

V5.625 

5.625 ^ 

V5.625 


= 1 


The determination will be one, however, only when Sy = Q 
and the y and ^/'-values are identical. In Fig. 58 this is not the 
case. Here = 9.625 and cXy'^ = 5.625. Therefore, 


== 


5.625 

9.625 


.584 = 58.4% 


Formula No, 39 


Formula No. 40 


T = ' 


for determination 


for correlation 




272 


CORRELATION AND DETERMINATION 


These relationships may be restated as follows: 


But 

and by definition 



therefore, transposing terms and solving for we have, 



Formula No. 41 
S 2 

== 1 for determination 

(Ty 


Formula No. 42 



for correlation 


These formulas are based on the following facts and relation- 
ships: 

= the total amount of variation in Y that is to be ex- 
plained. 

== that portion of the total variation of Y that is explained 
by X 

Sy^ = that portion of the total variation of Y that is not ex- 
plained by X. 


Since the amount of variation that is explained by X plus the 
amount that is not explained by X equals the total amount of 

S ^ 

variation in Y, it necessarily follows that the squared stand- 

O'y 

ard error of estimate over the standard deviation of Y is the 
supplement of the coefficient of determination. In exact proper- 
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tion as one becomes larger, the other must become smaller. As 
r approaches 0, — approaches 1, and as r approaches 1, ~ ap- 

(T y (Ty 


proaches 0. 

Formulas No. 39 and No. 40 are rarely used in actual statistical 
computations, although they are the actual basis of all correla- 
tion formulas. Formulas No. 41 and No. 42 are widely used be- 
cause they are easily manipulated in connection with regression 
computations. Since no regression line is complete without its 
standard error of estimate, the Sy must always be computed. 
With this value at hand, the computation of 



is almost no additional work. 

Another relationship between the three values of crt/, Sy^ and r 
should be noted at this point. 


Formula No. 43 

Sy ^ (ry\/l — 


Sy 


Since and are supplements and together equal 1, as 


becomes larger, 

= 0, aSj, = dy. 


S, 


becomes smaller and vice versa. When 
When 7*2= 1, aSj, = 0. There are cases in which 
the correlation is computed before the regression. In such cases 
the above formula is the briefest possible method for comput- 
ing the standard error of estimate. 


FORMULA FOR PEARSONIAN^ COEFFICIENT OF 
CORRELATION 

Formula No. 44 

j.- 

Ndxdy 


^ Named for Karl Pearson who originated it. 




274 


CORRELATION AND DETERMINATION 


This formula is obtained from the formula r = — by algebraic 

(Ty 

manipulation as follows: 

By shifting the measurement of the relationship between X 
and F on a straight least squares regression line from the inter- 
section of the Z-axis with the F-axis, or zero, to the intersection 
of the two means F and Z, the equation F = a + 6Z is reduced 
to ^ = bx. Since a is the measure of the F intercept, it disappears 
from the equation when we leave the F-axis, or the point of 
origin. If from the equation 

F = a + i>Z 

we subtract ^ 

the result is F - F = fe(Z -- Z) 

which is y — bx, 


we have shifted from the large Z and F of the original data to 
the deviations from the mean as shown in Fig. 59. 



Fig. 59. Showing shift from Y ^ a + bX to y = hx 


In both equations Y — a + bX and y = hx, the slope of the re- 
gression line, h, is the same. The only difference is that in the 
first case the relationship between Z and F is measured in terms 
of the original data, while in the second case it is measured in 
deviations from the means, F and X, and, therefore, in terms of 
Y — Y - y and Z — Z = x. The estimated values for F are no 
longer F' but y^. Since cry is based on y' , it follows that hx may 
he substituted for its equivalent y' in the equation y' == hx. 
Therefore, the equation 
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may be written 


= V/; 


Ncrx(Ty 


This formula is easy to compute, but is limited to straight re- 
gression lines. It is widely used in such relationships. 


WORKSHEET NO. 39 

Detailed Worksheet for Using Individual Items m Compu- 
tation OF THE PeARSONIAN COEFFICIENT OF CORRELATION 


X - 

Z 

= 

X 


Y 

- Y 

== 

y 


xy 

12 - 

10 


+ 2 

4 

14 

- 9 

= 

+ 5 

25 

+ 10 

9 -- 

10 

= 

- 1 

1 

8 

- 9 

= 

- 1 

1 

-1 1 

8 - 

10 

= 

- 2 

4 

6 

- 9 

= 

- 3 

9 

+ 6 

10 - 

10 


0 


9 

- 9 

= 

0 



11 - 

10 

= 

+ 1 

1 

11 

9 

= 

+ 2 

4 

+ 2 

13 - 

10 

= 

+ 3 

9 

12 

- 9 

= 

+ 3 

9 

+ 0 

7 - 

10 

= 

- 3 

9 

3 

- 9 

= 

- 6 

36 

+ 18 





28 





84 

46 


= \/w = \/y = ^ = 3.464 


Xxy 46 _ 46 

N^y "■ 7 X2 X 3.464 *" 48.496 


1 The transformation is 


/ 


j {ZxyY 2 x 2 

/{Sxyy 


^LUW^J 

V/ 2x22x2 1 \ 

j 

J X ^ 

N 

^ N ^ 

^ N 


but in the denominator of == ■ 
Therefore, 


Cy^ equals — 




^ N y Zx^ ^Zy^ 


Extracting the root, 


N 
i^xyy 


Zxy 


' 27/2 

Zxy 


Zx^Zy^ 
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The above worksheet may be used for small samples, but a 
shorter and preferable method is available. The only advantage 
of Worksheet No. 39 is that it shows in detail all the steps used in 
computing the Pearsonian coefficient. It is like placing the en- 
tire operation under a microscope. 

In practical statistical computations, the following method 
should be used for individual items. 


WORKSHEET NO, 40 


X 

Y 

XF 

X2 

p 

12 

14 

168 

144 

196 

9 

8 

72 

81 

64 

8 

6 

48 

64 

36 

10 

9 

90 

100 

81 

11 

11 

121 

121 

121 

13 

12 

156 

169 

144 

7 

3 

21 

49 

9 

70 

63 

676 

728 

651 


Formula No. 45 


2x2 = 2X2- 


(2X)2 

N 


728 


= 728 - = 728 - 700 = 28 


Formula No. 46 

= 2P - = 651 - ^ = 651 - — = 651 - 567 = 84 


Formula No. 47 

Sx2/=2X7-^5^^^^=676-^^^=676-~=676-630=46 


r = 


Formula No. 48 
'Zxy _ 46 _ 46 

V2x2 * 22/2 V2352 4^ 


= .95 
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Formula No. 49 

2 ^ (SzyY ^ (46)^ ^ 2116 ^ 

28 X 84 2352 

Worksheet No. 40 and the formulas associated with it are 
among the most useful methods in statistics. From it with a 
minimum of machine work and time may be computed the arith- 
metic means, standard deviation, the regress ion lin e, F = a + bX , 
the standard error of estimate, Sy-cTyVl—r^; and the co- 
efficient of correlation, 

The student should become thoroughly familiar with it and use 
it whenever he works a sample on the basis of individual items* 

is a better formula than 

j.- 

NaxO-y 

for most cases because it saves one extraction of square root and 
gives the same result as the Pearsonian form. 

FORMULA FOR UNGROUPED DATA 

It is not necessary to reduce the data to deviations from the 
mean in order to compute correlation. The formula may be so 
altered in form without changing its value that the sums of the 
original data and their squares and products as given in Work- 
sheet No. 40 may be used. 

Formula No. 50 

r - (X X • SF) 

V[NSZ2 - (SX)2] [NSF^ - (SF)2] 

^ 7 • 676 - 70 • 63 

V[7 ■ 728 - (70)2] [7 • 651 - (63)^] 
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4732 - 4410 


\/[5096 
322 

Vl96 • 588 
322 


4900J L4557 - 3969] 


^^ 115,248 
322 
338 


.95 


The ungrouped data formula maintains the same relationship 
throughout that are used in the Pearsonian coefl&cient formula of 


r == 


NcTxCFy 


but in the ungrouped data formula these relationships are expressed 
in the large sums of the original data while in the Pearsonian co- 
efficient they are expressed as deviations from the mean. 

1. The advantages of the ungrouped data method are that it 
(a) saves the time of computing the deviations from the mean 
and (b) eliminates all or most of the decimal fractions. 

2. Its disadvantages are that it (a) runs into large numbers 
which are difficult to get on the calculating machine and (b) looks 
long and difficult to the beginners. It is an excellent method for 
a small sample when the individual items are used. 


COBRELATION TABLE 

Another device that is very useful in computing the coefficient 
of correlation and all related measures at the same time is the 
correlation table. It is of special value if one has a large sample 
which mud be thrown into class intervals for economy of manipu- 
lation. One can manage a sample of ten thousand or even a hun- 
dred thousand items as easily, after the class intervals and tally 
.sheet have been made, as he can thirty or a hundred items. 

Several forms of this type of table are available in different 
texts and statistical journals. The form given below is as useful 
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and easy to manage as any. The first step is the creation of a 
cross-classification tally sheet as follows: 

WORKSHEET NO. 41 


Cross-Classification Tally Sheet for Height and Weight of* 
75 Grade School Children Height X ; Weight Y 


Weight 

Y 

Height X 

40-43 

44-47 

48-51 

52-55 

56-59 

60-63 

Fy 

35- 54 

1 






26 




ua 





55- 74 



JMi 1 

uirtxm 11 

111 


31 

75- 94 




11 

XYYt 

11 

12 






111 



95-114 





1 

11 

3 

115-143 




1 



1 

135-154 





1 

1 

2 

Fx 

1 

10 

21 

25 

13 

5 

75 


After the tabulation and totals have been completed in the 
above cross-classification tally sheet, the class intervals with 
their appropriate frequencies and totals are set up as part of the 
Correlation T^l^shown in Worksheet No. 42, from which may 
be computed Z, F, cri, cr^, r, r^, ay, a^, Sy, and Y = a + bX. 

Explanation of Worksheet No, 4^ 

The captions of the columns and the stubs of the lines are largely 
self-explanatory. This worksheet is a device for computing two 
standard deviations at the same time, one on Z, the other on Y, 
Its elements are similar to those in Worksheet No. 32 for com- 
puting the single standard deviation, with the exception that 
two such worksheets are combined here. One of them is hori- 
zontal; the other is perpendicular. The totals in the bottom 
lines supply the Z standard deviation. The totals on the right 
supply the Y standard deviation. Since the short method with 
class deviations from an assumed mean is used, it is necessary to 
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WORKSHEET NO. 42 
CORRELATION TABLE 


COKEELATION BETWEEN HbIGHT (X) AND WEIGHT (F) OF 75 
Stillwater School Children op Grades 1 to 6 


Y 

X 

Fv 

dy 

fdy 


XdxFy 

fdxdy 

40-43 

44-47 

48-51 

52-55 

56-59 

60-63 

35-54 

1 

10 

15 




26 

- 1 

- 26 

26 

- 38 

38 

55-74 



6 

22 

3 


31 

0 



- 3 


75-94 




2 

8 

2 

12 

+ li 

+ 12 

12 

+ 12 

12 

95-114 





1 

2 

3 

+ 2 

+ 6 

12 

+ 5 

10 

115-134 




1 



1 

+ 3 

+ 3 

9 

0 

0 

135-154 





1 

1 

2 

+ 4 

+ 8 

32 

+ 3 

12 

Fx 

1 

10 

21 

1 25 

1 

13 

5 

75 


+ 3 

91 

-21 

72 

dx 

- 3 

- 2 

- 1 

0 

+ 1 

+ 2 


y class 20 

X class 4 

hyx in class intervals 

_ S.r2/ 72 84 

Sa:2 97 12 ~ 

6.x = 75 X 5 = 3 75 
in terms of original data 

fdx 

-3 

-20 

- 21 


+ 13 

+ 10 

- 21 


1 9 

40 

21 

I 

13 

20 

103 

XdyFx 

- 1 

- 10 

- 15 

-fs 

+ 14 

+ 10 

+ 3 

fdxdy 

i 3 

20 

15 


: 14 

20 

72 


'Sd,f -Zdyf Sd^f 

Sums — 21 +3 103 0 91.0 72 

Means of Sums — .28 .04 

Corrections (subtracted) 5 88 0 12 — .84 

-Corrected Values 97 12 90 88 72 84 


a = Y - bX 

= 65 8 - 3 75 X 52 88 
= - 132 5 

Y + bX 

= - 132.5 + B.75X 


r 


'Sxy 72 84 72 84 

V97 12 X 90 88 


776 ^ 


X = ^ + 


Xfd. 


= 54 + - 


-21 


4 = 54 - 1.12 = 52 88 


Y 


A + 


S/d.. 

N ^ 


65 + -20 


65+ 8 = 65 8 


cr* = = 4 = 4 X 1 138 = 4 54 

<ru == = 20^?^ = 20 X 1.10 = 22.0 
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correct for this error just as in Worksheets No. 16 and No. 32. 
The corrections are computed in the following form: 

24/ 

24/ 

24y 

24y 

"Sdxdyf 

Sums - 21 

+ 3. 

103 

91 

72. 

Means — .28 

+ .04 




Corrections (subtracted) 


5.88 

0.12 

- .84 

Corrected Values 


97.12 

90.88 

72.84 


The sums used above are obtained from the appropriate columns 
and lines in the worksheet. It will be noted that the sums for 
Xdxf, ^dyfj Xdxdyf appear in both the X and Y totals. This de- 
vice gives a check on the correctness of these totals. Any discrep- 
ancy in these paired sums indicates an error in the computations 
of the worksheet. These corrected values are used for all further 
computations. 

In computing the correction above, the product of the — 21 
times its mean, — .28, is the correction, 5.88, for The 

product of the + 3 times its mean, + .04, is the correction, 0.12, 
for 2d//. The product of the — 21 times the mean, + .04, the 
sum of Xdyf times the mean of 'Edxf is the correction of 0.84, 
for Xdxdyf. 

Since the value for byx is computed from the corrected totals in 
class intervals, it must be multiphed by the quotient of the class 
intervals, as follows: 


, _ 2a:y _ 72.84 

"" 2a:2 97.12 

Class interval correction = — ^ 

X class 4 


- .75 

= 5 


hyx corrected = .75 X 5 = 3.75 in terms of pounds of w’^eight for 
each inch of height. All the other computations have been ex- 
plained in earlier chapters. 


Regression Formula for Correlation 
Foemula No. 51 for determination = byx • hxu 
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Formula No. 52 for correlation 

T ~ * ^xy 

Both of these h values can be computed from Worksheet No. 42. 
If one for other purposes has already computed the Es, this is 
the easiest method for computing the coefficient of correlation. 



Fig. 60. Normal equations for Fig. 61. Normal equations for 
computation of F = a + bX computation of X == a A hY 


Since 


Since 


27 - Na + &2Z 2Z - Wa + 62F 

2X7 = a2X + f)2X2 2Z7 - a27 + 627^ 

- — '^^y 

~ Naxcry ~ VZx^ ■ 'Zy^’ 
l,xy ■ :Exy /'Exy\ /Xxy\ 

Sa;2-2y2 V2a:V \S2/V "" 


, "Sxy „ 
22/2 


^yx ' ^xy 


All of the above formulas for correlation are closely related and 
may be transformed algebraically one into the other. All of 
them give the same results. The statistician will ordinarily use 
the one which is most convenient for the type of calculations he 
finds desirable to make. 

If one wishes only the coefficient of correlation or determina- 
tion and the sample is not large, not over 50 or 100 items, For- 
mula No. 44 or No. 48 is best. 
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If the sample is quite large and the regression lines are also de- 
sired, the correlation table and associated formulas are best. A 
thousand or even ten thousand or more items can be manipulated 
quite easily by this method. It is especially useful in economics, 
education, sociology, and other social sciences where large sam- 
ples are often used. 

If the standard error of estimate, Sy, has already been com- 
puted or is required. Formulas No. 41 and No. 42 are best. In 
most cases. Formulas No. 39 and No. 40 are less convenient to 
use, but sometimes may be useful. 


RANK DIFFERENCE METHOD OF CORRELATION 

Another measure of correlation between two variables based on 
a comparison of separate pairs of items is the rank difference 
method shown below. The pairs of items are ranked for each 

WORKSHEET NO. 43 


Computation of Cokeelation of Rank of Eleven States* 
IN Acreage in Wheat and Corn, 1928--32 


1 

! 

Wheat 

Acreage 

Average 

1928-32 

(100,000) 

Corn 

Acreage 

Average 

1928-32 

(100,000) 

Rank 

Wheat 

Rank 

Corn 

D D2 

(Difference) 

Kansas 

133 

69 

1 

3 

- 2 

4 

Oklahoma 

47 

32 

2 

8 

- 6 

36 

Texas 

39 

48 

3 

5 

- 2 

4 

Nebraska 

37 

98 

4 

1 

+ 3 

9 

Illinois 

22 

91 

5 

2 

+ 3 

9 

Ohio 

18 

35 

6 

7 

- 1 

1 

Indiana 

18 

45 

7 

6 

+ 1 

1 

Missouri 

17 

62 

8 

4 

+ 4 

16 

Colorado 

15 

17 

9 

9 

0 

0 

Washington 

13 

1 

10 

11 

- 1 

1 

Pennsylvania 

10 

13 

11 . 

10 

+ 1 

1 

82 


* Statistics of Agriculture, 1936, pp. 9, 34. 
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variable. The largest item is given a rank of 1, the next largest 
item a rank of 2, the next 3, and so on to the smallest item.^ 
The difference of the paired ranks is then taken by subtracting 
the second variable rank from the first variable rank. These D’s 
are squared and summed. If the ranks are all identical for the 
two variables the differences are all zero (0) and the correlation 
equals 1. This method is useful for smaller samples using indi- 
vidual items where a high degree of accuracy is not required. It 
is the simplest and shortest of all the methods presented in this 
text. It is not suited to very large samples. 


Formula No. 53 


N{m - 1 ) 


6 492 

N{N^-1) ^ ll(lP-l) 


492 

1320 


= 1 ^ 


.37 = .63 


SUMMARY 

1. The coefficient of determination is a percentage figure which meas- 
ures the amount of variation in one variable called the dependent variable 
which is associated with, or varies with, variation in another variable 
called the independent variable. The coefficient of determination would 
measure the percentage of the variation in plant growth in an area that 
is associated with variation of rainfall in the identical area. Its symbol 
is 

2. The coefficient of correlation is the square root of the coefficient 
of determination and is, therefore, not a percentage figure. Its symbol 
is r. 

3. The base of both the coefficient of determination and the coefficient 
of correlation is unity, or 100%. When the coefficients of correlation and 
determination are unity (1), the variation in the one factor is identical 
with the variation in the other factor. When the coefficients are less 
than unity, part of the variation in the dependent variable may not be 
accounted for by variation in the independent variable. 

1 If two or more items have the same rank, both items are given the aver- 
age of the ranks. If the third and fourth items are the same, both are ranked 
3| and the next item takes its normal rank of 5. If the fifth, sixth, and seventh 
items were all the same, each of the three would be ranked 6 and the next 
item ranked in its proper turn of 8. 
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4. The basic formula of the coefficient of determination rests on the 
ratio between the squared standard deviation of the estimated values of 
the dependent variable computed from a regression line showing the 
average relationship between the two variables. 



5. The formula of the Pearsonian coefficient of correlation, named for 

its originator, Karl Pearson, is r = . 

N <7 xCTy 

6. The coefficient of determination is equal to the product of the re- 
versed regression coefficients: = hyx • h^y. 

7. The correlation table is a device for computing the correlation be- 
tween two variables which have been thrown into cross frequency dis- 
tributions. It is especially useful for large samples. 

8. The fact that there is correlation between two variables does not 
prove that there is a casual relationship between them, but only that 
their variations are associated. 
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REVIEW QUESTIONS 

1. What is the difference between determination and correlation? 

2. In what respects is the coefficient of correlation misleading? 

3. What is positive correlation? Negative? Give two illustrations 
of each. 

4. What is the relation of correlation to regression? Why is it im- 
portant to fit the best possible regression line to a sample? 

cr 

5. Explain the formula • 

<^y 

S 2 

6. What relation does the formula in question 5 bear to = 1 ~? 

O'y 

7. What is the relation of the Pearsonian coefficient, r — to 

NcX^CTy 



8. WTiat are the advantages of Formula No. 49 over Formulas No. 44 
and No. 50? 

9. What are the advantages and disadvantages of a correlation table? 

10. What is the relationship of the standard error of estimate to the 
coefficient of correlation? Explain fully. 

11. Of what use is the coefficient of determination or correlation in 
scientific studies? Explain fully. 

12. What is the difference in the units of measurement of correlation 
and determination? 


EXERCISES 

1. Compute the correlation between the heights and weights of chil- 
dren given in Worksheet No. 1. 

2. Compute the determination between the age and weights of children 
given in Worksheet No. 1. 

3. Compute the determination between the two variables in each of 
the five problems in the Exercises Nos. 1 and 2 at the end of Chapter 11. 



CHAPTER 13 

TABULAR ANALYSIS 


A cross-classification table is the simplest and easiest device for 
discovering the more simple and apparent relationships which 
exist among variables. Tabular analysis is not the making of 
tables. It is the use of tables as analytical devices to discover and 
measure statistical relationship. If one variable changes in any 
constant or relative proportion to changes in another variable, 
this quantity can be measured by dividing one variable into class 
intervals and computing the averages of the associated values of 
the other variable for each class interval. The mathematics re- 
quired for this type of analysis is quite simple, consisting almost 
entirely of totals and simple averages. The principles upon which 
cross-classification tables are constructed were explained in de- 
tail in Chapter 7. A brief review of those materials at this 
point will aid the student in the present analysis. The easiest 
way for the beginning student to comprehend this method is to 
follow dihrough the successive stages of a complete problem. 


DATA TABLES 

Tables 17-25 inclusive reveal the steps required. Three types 
of tables are used. The first is the data table, which includes the 
material to be analyzed. Table 17 illustrates this method of as- 
sembling data. Figures for 85 farms in the hard winter wheat 
area were collected for (1) size of farm in acres, (2) number of 
milk cows, (3) wheat yields in bushels, and (4) rate of income. 
In Table 17 these data are recorded by farms as they were received 
from the schedules without any thought or plan for logical ar- 
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TABLE 17. DATA TABLE 

Faem Data on Size of Fabm, Number op Cows, Yield of Wheat, 
AND Rate of Income Used in Illustrating the Theory 
AND Methods of Tabular Analysis 


No. 

Size 

Acre 

Milk 

Cows 

Wheat 

Yield 

Rate 

of 

In- 

come 

No, 

Size 

Acre 

Milk 

Cows 

Wheat 

Yield 

Rate 

of 

In- 

come 

1 

160 

7 

12 

1.6 

44 

180 

2 

20 

2.5 

2 

160 

7 

14 

.1 

45 

402 

4 

17 

10.5 

3 

160 

4 

13 

2.8 

46 

480 

6 

20 

7.7 

4 

160 

4 

18 

1.0 

47 

450 

1 

14 

3.7 

5 

160 

5 

16 

3.0 

48 

420 

1 

26 

11.0 

6 

160 

4 

20 

2.3 

49 

640 

2 

12 

3.0 

7 

160 

4 

15 

1.9 

50 

260 

4 

12 

4.8 

8 

160 

4 

11 

1.4 

51 

395 

42 

26 

14.7 

9 

160 

5 

14 

2.9 

52 

220 

9 

16 

3.5 

10 

160 

2 

11 

.2 

' 53 

325 

16 

23 

7.2 

11 

160 

5 

13 

6.5 

54 

520 

8 

17 

9.0 

12 

160 

5 

31 

10.5 

55 

400 

13 

16 

4.2 

13 

160 

6 

15 

5.7 

56 

285 

18 

27 

16.9 

14 

160 

3 

17 

9.2 

57 

297 

14 

22 

9.7 

16 

160 

2 

16 

8.0 

58 

307 

11 

20 

5.2 

16 

190 

2 

9 

.6 

59 

215 

8 

20 

6.5 

17 

230 

6 

11 

1.8 

60 

220 

1 

21 

8.3 

18 

400 

2 

14 

5.6 

61 

255 

9 

22 

10.0 

19 

260 

1 

13 

2.3 

62 

225 

3 

20 

5.2 

20 

220 

13 

24 

17,8 

63 

400 

8 

25 

14.6 

21 

250 

8 

15 

3.8 

64 

135 

3 

20 

5.1 

22 

271 

2 

20 

9.1 

65 

148 

7 

14 

4.0 

23 

303 

7 

15 

1.4 

66 

360 

4 

22 

6.2 

24 

165 

5 

16 

8.9 

67 

280 

6 

23 

10.2 

26 

380 

8 

19 

13.7 

68 

480 

6 

25 

7.4 

26 

150 

6 

20 

6.6 

69 

305 

7 

18 ‘ 

5.6 

27 

250 

6 

18 

7.9 

70 

400 

9 

23 

10.6 

28 

390 

9 

18 

8.3 

71 

165 

6 

12 

1.8 

29 

310 

5 

14 

2.0 

72 

210 

9 

10 

.8 

30 

260 

4 

15 

6.5 

73 

304 

6 

10 

1.1 

31 

480 

12 

11 

2.8 

74 

250 

7 

24 

5.0 

32 

300 

6 

27 

8.0 

75 

200 

5 

14 

4,5 

33 

155 

6 

18 

8.9 

76 

230 

6 

25 

13.2 

34 

180 

9 

22 

8.3 

77 

160 

4 

22 

5.7 

35 

450 

8 

14 

6.4 

78 

320 

4 

26 

8.4 

36 

320 

5 

18 

54 

79 

160 

16 

11 

5.4 

37 

320 

10 

25 

7.9 

80 

160 

4 

10 

1.0 

38 

320 

10 

14 

2.6 

81 

160 

6 

12 

3.5 

39 

240 

14 

18 

2.5 

82 

160 

6 

12 

2.0 

40 

400 

19 

13 

5.0 

83 

160 

2 

17 

10.2 

41 

262 

12 

22 

6.2 

84 

160 

6 

11 

4.3 

42 

180 

5 

20 

6.6 

85 

160 

9 

14 

3.0 

43 

232 

6 

12 

3.1 
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rangement. For tabular analysis this naethod and order of re- 
cording the data for the first time is quite sufficient. Regardless 
of the order in which the data first appear they must be com- 
pletely rearranged during the process of analysis. 


ORGANIZATION TABLES 

The heart of the whole process of tabular analysis is the or- 
ganization table. In complex or long problems there may be 
several such tables. In this example three are sufficient. As was 
emphasized in Chapter 4 on Organizing a Statistical Prob- 
lem/’ a large amount of thought should be given to the form and 
contents of each organization table before the schedules are 
printed or the data collected. The statistician should decide in 
planning his study what data he wishes to collect and the exact 
form of organization table in which he expects to organize them. 
An organization table is an analytical device. It may be as simple 
or as complex as its purpose demands. For two variables a sim- 
ple one-way table such as Table 18 is sufficient. 

Class Intervals of Independent Variable 

The first point in designing an organization table after the data 
are at hand is to decide on the number and width of class intervals 
for the independent variable or variables. Table 18 is set up in 
the simplest form with only two variables. Wheat yields are 
the independent variable, and the class intervals are chosen for 
it. These may be of uniform or of varying width. The range in 
this case is from 9 to 31 bushels inclusive, or 23 bushels. This 
range in this case is divided into three class intervals as follows: 
(1) 14 bushels and under, (2) 15-21 bushels, and (3) 22 bushels 
and over. This gives a sufficiently equal distribution of the data 
among the three classes. There would be no objection to having 
four or even five or more classes. Three were chosen in this case 
to show a brief simple analysis. A larger number would ordi- 
narily be preferred because they would reveal the relationship in 
greater detail. However, too many classes make a table confus- 
ing and unwieldy. 




TABLE 18. ONE-WAY ORGANIZATION TABLE 


Organization of the Data on the Two Variables, Wheat Yield 
AND Rate of Income to Reveal the Cross Relationship 
Between Them 


14 bu. and Under 

15 bu. to 21 bu. 

22 bu. and Over 

Wheat 

Yield 

Rate of 
Farm Income 

Wheat 

Yield 

Rate of 
Farm Income 

Wheat 

Yield 

Rate of 
Farm Income 

12 

1.6 

18 

1.0 

31 

10.5 

14 

0.1 

16 

3.0 

24 

17.8 

13 

2.8 

20 

2.3 

27 

8.0 

11 

1.4 

15 

1.9 

22 

8.5 

14 

2.9 

15 

5.7 

23 

7.9 

11 

0.2 

17 

9.2 

22 

6.8 

13 

6.5 

16 

8.0 

26 

11.0 

9 

0.6 

15 

3.8 

26 

14.7 

11 

1.8 

20 

9.1 

23 

7.2 

14 

5.6 

15 

1.4 

27 

16.9 

13 

2.3 

16 

8.9 

22 

9.7 

14 

2.0 

19 

13.7 

22 

10.0 

11 

2.8 

20 

6.6 

25 

14.6 

14 

6.4 

18 

7.9 

22 

6.2 

14 

2.6 

18 

8.3 

23 

10.2 

13 

5.0 

15 

6.5 

25 

7.4 

12 

3.1 

18 

8.9 

23 

10.6 

14 

3.7 

18 

5.4 

24 

5.0 

12 

3.0 

18 

2.5 

25 

13.2 

12 

4.8 

20 

6.6 

22 

5.7 

14 

4.0 

20 

2.5 

26 

8.4 

12 

1.8 

17 

10.5 



10 

0.8 

20 

7.7 



10 

1.1 

16 

3.5 



14 

4.5 

17 

9.0 



11 

5.4 

16 

4.2 



10 

1.0 

20 

5.2 



12 

3.5 

20 

6,5 



12 

2.0 

21 

8.3 



11 

4.3 

20 

5.2 



14 

3.0 

20 

5.1 

, 




18 

5.6 





17 

10.2 



Total 381 

90.6 

589 

204.2 

510 

210.3 

Means 12.3 

2.9 

17.8 

6.2 

24.3 

10.0 
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Paired Values Under Each Class Interval 

In Table 18, under each class interval of wheat yields is a 
double column of figures, the left-hand column of which contains 
the wheat yield of the farms falling in that class and the right- 
hand column of which contains the rate of income for the identical 
farms. Each column is totaled and simple averages computed. 
These paired averages are the basis for comparing and meas- 
uring the relationship between the two variables of wheat yields 
and rate of income for these 85 farms. 

TABLE 19 
SUMMARY TABLE 

Partial Summary of Relations Between Wheat Yields and Rate 
OF Income as Revealed in Construction Table No. 18 


Class Intervals 
for Wheat Yields 

Frequencies 
No. of Farms 

Totals for De- 
pendent Factor, 
Farm Incomes 

Average Rate of 
Farm Income 

14 bu. and under 

31 

90.6 

2.9 

15 bu, to 21 bu. 

33 

204.2 

6.2 

22 bu. and over 

21 

210.3 

10.0 

Totals 

85 

505.1 

5.9 


SUMMARY TABLES 

The third type of table used in tabular analysis is the summary 
table. It is composed of materials, class intervals, totals, averages 
and frequencies drawn from the organization table. Table 19 
consists of four columns: (1) class intervals of wheat yields, 
(2) frequencies, or the number of farms in each class, (3) the total 
of the dependent factor for each class, and (4) average of de- 
pendent factor for each class. 

This summary table reveals considerable information about the 
relationship between wheat yields and rate of farm income. 
Farms with yields of 14 or fewer bushels per acre have an un- 
weighted average rate of income of only 2.9%, which is not a very 
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profitable enterprise. Farms with yields of 15 to 21 bushels per 
acre have on the average a return of 6.2%, which is more than 
twice as large as the low yield farms. Farms with yields of 22 or 
more bushels per acre have on the average an income of 10.0% 
or more than three times that of the small production farms. This 
simple table reveals relationships which a farm economist would 
easily comprehend. Since the costs of preparing the soil, sowing 
the crop and harvesting are approximately the same per acre 
regardless of yield, it follows that large yields are much more 
profitable than small ones. From such a table it is possible to 
predict on the average the rate of income for farms within any 
given class. 


TABLE 20 
SUMMARY TABLE 

Final Summaky of Average Relationship Between Wheat 
Yields in Bushels and Rate of Income on 
85 Oklahoma Farms, 1939-1940 


Wheat Yields 

Farm Income 

Average per Class 

in Per Cent 

12.3 

2.9 

17.8 

6.2 

24.3 

10.0 


Summary Table 20 is a still more brief but no less meaningful 
presentation of significant relationships. In this case the com- 
parison is between the average wheat yield of each class and the 
average rate of income for the same class. It is a positive rela- 
tionship in which the rate of income increases in approximately 
the same ratio as the increase of wheat yields. Figure 62 indicates 
graphically the significance of this relationship. 

If it is desired to bring two independent variables into the 
analysis at once, a two-way organization table is necessary. In 
this case in addition to wheat yields the influence of size of farms 
on rate of income is taken into the analysis. The inclusion of 
two variables increases the complexity of the organization table. 
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10 13 16 19 22 25 

Fig. 62 . Relationship between yield of wheat and rate of 
income for 85 farms as shown in Summary Table 20 


Class intervals must be chosen for the additional variable. In 
this case farm sizes are divided into three class intervals as fol- 
lows: (1) Farms under 200 acres, (2) Farms of 200 acres to 320 
acres inclusive, and (3) Farms of more than 320 acres. These 
class intervals are laid out across the top of the organization 
table. Under each class of the top variable a complete set of 
class intervals for the second variable is placed. This second set 
of class intervals must be repeated over and over in exactly the 
same form under each class of the top variable, if a complete 
cross classification of the data is to be organized. This is the 
point that usually gives beginning students the greatest difficulty. 
The exact necessary relationships should he noted carefully. 

At this point in a two-way organization table three alternatives 
are presented. 

First Form. The values of the dependent variable only may be 
recorded in the appropriate class-interval columns. That means 
in this case that rates of income only would be recorded in the 
table. There would be only one column of figures under each 
class interval and they would be the rate of income figures. The 
first three figures of the first column would be 1.6, 0.1, 2.8. The 
first three figures of the second column would be 1.0, 3.0, 2.3. 
Only the % column would appear. The hu. column would be ab- 
sent. For many purposes this type of table would be sufficient. 
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Second Form. If it is desired to show in the summary table a 
more exact measure of the relationship between one of the inde- 
pendent variables and the dependent variable, a double column in- 
cluding the figures for the two variables desired must be included 
under each class interval. In Table 21 these are (bu.) and (%). 
This form of organization makes it possible to obtain totals and 
averages for both variables and to compute the ratios between 
them. Some of these results are shown in Summary Tables 22 
and 23. 

Third Form. If it is desired to compute such summary ratios 
and detailed relationships between the dependent and both of 
the independent variables, the figures of the original data for 
both independent variables must be included under each class 
interval. For such a computation for the above data there would 
appear under each class of wheat yields, three sub-columns as 
follows: (acres), (bu.), (%). With such an organization table 
three cross summaries could be shown, (1) Size of Farm to Wheat 
Yields, (2) Size of Farm to Rate of Income, and (3) Wheat Yields 
to Rate of Income. Such an organization table is not included 
here because it would be too large to place on one page of the 
book, but it is of little greater difficulty to make than Table 21. 
It would have to be at least 50% wider than Table 21. 

Summary Table 22 shows the average rate of income as in- 

TABLE 22 
SUMMARY TABLE 


Summary of the Relationships Between Rate of Income and Wheat 
Yield, and Between Rate of Income and Size of Farm, Showing 
Combined Effect of Two Independent Variables on Income 


Wheat Yields 
Class Intervals 

Size of Farm Units 

Averages 

i 

Under 

200 Acres 

200 to 

320 Acres 

Over 

S 320 Acres 

14 bu. and under 

2.6 

2.6 

4.4 

1 2.9 

15 bu. to 21 bu. 

5.3 

5.3 

8.9 

6.2 

22 bu. and over 

8.2 

10.3 

10.3 

10.0 

Averages 

4.5 

6.2 

8.0 

5.9 
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fluenced by yields of wheat and size of farms. Wheat yields are 
the most important factor in income for these farms. For low 
yields the rate is only 2.9%, while for high yields it is 10.0%, or 
a range of 7.2%. On the other hand trebling the size of farms 
increases the income only from 4.5% to 8.0% or a range of 3.6%. 
Yield, therefore, has approximately twice as much effect on in- 
come as does size of farm. 


Computing Totals and Averages for Summary Tables 

The averages for a summary table are computed from the totals 
in the sub-classes of the organization table as follows: (1) Add 
the totals of all sub-classes for each class interval. Example: 

CLASS INTERVAL 
Wheat Yields 14 Bu. and Under 


Farms 

Sub-class Totals 
(bu.) 

Sub-class Totals 
(%) 

16 

193 

41.1 

9 

no 

23.0 

6 

78 

26.5 

31 

381 

90.6 


31 


12.3 


90.6 

31 


- 2.9 


All the other averages of the summary tables were obtained from 
the organization table by the same method. This method gives 
a weighted average for the summary table averages. 


TABLE 23. SUMMARY TABLE 
Variations in Wheat Yields for Farms of Various Sizes 


Wheat Yields 
Class Intervals 

Size of Farm Units 

Averages 

Under 200 
Acres 

200 to 320 
Acres 

Over 320 
Acres 

14 bu. and under 

12.1 

12.2 

13.0 

12.3 

15 bu. to 21 bu. 

12.7 

18.0 

17.8 

17.8 

22 bu. and over 

25.0 

24.3 

24.3 

24.4 
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Summary Table 23 shows average variations in wheat yields 
for size of farms and classes of yields. It is merely a further 
indication of the various types of relationships which may be 
drawn out of a well-designed organization table. 

TABLE 24. THREE-WAY ORGANIZATION TABLE 

Organization of the Data fob Three Independent Variables, 
Wheat Yield, Number of Cows and Size of Farms, as Re- 
lated TO THE Rate of Income of 85 Oklahoma Farms 


Farms with 6 Cows or Less 



Under 200 Acres 

200 to 320 Acres 

Over 320 Acres 


14 Bu. 
and 

15to21 

Bu 

22 Bu 
and 

14 Bu. 
and 

15 to 21 
Bu. 

22 Bu 
and 

14 Bu 
and 

15 to 21 
Bu 

22 Bu. 
and 


Under 

Over 

Under 

Over 

Under 

Over 


28 

1 0 

10 5 

18 

9 1 

8.0 

56 

10.5 

11 0 


1 4 

3 0 

57 

2.3 

65 

84 

37 

7.7 

7.4 


2.9 

2.3 


2.0 

54 

10.2 

30 


6.2 


02 

1.9 


3.1 

5.2 

13.2 





65 

57 


1 1 

83 






06 

92 


4.5 







1.8 

SO 


4.8 







10 

8.9 









35 

66 









20 

7.9 









43 

8.9 










6.6 










2 5 










5.1 










10 2 








Total 

27 0 

87 8 

16 2 

19 6 

34 5 

39 8 

12 3 

18 2 

24 6 

Means 

2 5 

59 

8.1 

28 

69 

10 0 

4 1 

9 1 

8.2 


Farms with More Than 6 Cows 



Under 200 Acres 

j 200 to 320 Acres 

Over 320 Acres 


14 Bu. 

15 to 21 
Bu. 

22 Bu 

14 Bu 

15 to 21 
Bu 

22 Bu 

14 Bu. 

15 to 21 i 
Bu 

22 Bu 


and 

and 

and 

and 

and 

and 


' Under 

Over 

Under 

Over 

Under 

Over 


1 6 


85 

26 

38 

17 8 

28 

13 7 

14 7 


0 1 



0.8 

14 

7 9 

6 4 

8.3 

72 


40 




2 5 

62 

50 

90 

14.6 


54 




58 

16 9 


4.2 

10 6 


3 0 i 




65 

9 7 









56 

10 0 









35 

50 




Total 

14 1 


8,5 

34 

29 1 

73 5 

14 2 

35 2 

47.1 

Means 

2.8 


85 

1 7 

4.4 

10 5 

4.7 

88 

11.8 
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If one wished to exhaust the possibilities of analyzing the data 
on farms in Data Table 17, it would be necessary to use a three- 
way organization table, such as Table 24. 

The fundamental principle is laid down at this point, that the 
complexity of organization tables increases in proportion to the prod- 
uct of the number of class intervals for each variable. This principle 
may be illustrated by Organization Tables 18, 21, and 24. In 
Table 18, there are 6 columns, 3 class intervals and 2 sub-classes, 
or 3 X 2 = 6 columns. In Table 21, there are 3 class intervals for 
farm sizes, 3 class intervals for wheat yields, and 2 sub-classes. 

3 X 3 X 2 = 18 columns 

In Table 24, there are 2 classes for number of cows, 3 classes 
for size of farms, 3 classes for wheat yields, and 1 sub-class. 

2X3X3X1=18 columns 

If there had been 3 classes for cows, and four sub-classes for 
the items of data, the requirement for such a complete organiza- 
tion table would have been, 

3X3X3X4 = 108 columns 

It is, of course, possible to make any number of class intervals 
desired. If there had been only 4 classes for each variable, the 
result would have been 

4X4X4X4 = 256 columns 

for the organization table, a table so large that it would cover the 
entire top of a large desk. For this reason, there is a practical 
necessity for keeping both the number of class intervals and 
the number of variables used in tabular analysis to a workable 
minimum. 

Another cogent reason for not going beyond four variables in 
tabular analysis is that the relationships among the variables be- 
come so complex that it is increasingly difficult for the statistician 
to discover and measure and explain them all adequately in any 
manageable space or time. This method is easy and effective 
within narrower limits but becomes unmanageable when ex- 
panded too far. 
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A three-way organization table is usually best set up in the 
form of Table 24, in which two variables are placed one above the 
other across the table, or on the X-axis, and the third variable is 
placed on the side, or on the F-axis. In Table 24, size of farms 
is placed in three classes across the entire table. Wheat yields 
are placed in three classes, under each of the farm size classes. 
This arrangement takes care of two variables^. The third variable, 
number of cows, is introduced by duplicating the above table so 
that the upper section is marked ^^6 cows or less^^ while the lower 
section is marked ^^over 6 cows.’^ If we wished to divide cows 
into three classes, it would be necessary to make still another 
section of the table below with the third class of cows as a head- 
ing. By this process a table can be expanded indefinitely on 
both axes. 

Many types and models of summary tables may be developed 
for such a complex organization table as Table 24. Table 25 is 
one of the simpler general forms. It shows the relationships be- 
tween (1) wheat yields and rate of income, (2) size of farms and 
rate of income, and (3) number of cows and rate of income for 
class intervals and totals. If two or more sub-classes for data 
had been included under the wheat yield class intervals a summary 
table could have been set up for relationship between (1) cows 
and wheat yields, (2) cows and size of farms and (3) size of farms 
and wheat yields. 

From Table 25 it is evident that the number of cows on a 
wheat farm does increase its income, but to a smaller degree than 
size of farm or wheat yields. The measurement of the influence 
of number of cows on income is clouded in this table by the fact 
that it does not appear to be uniform. For some sizes of farms 
and for some class yields it is smaller. For others in which one 
would expect a higher rate, the income is smaller. As an illus- 
tration of this variation one may note that the larger farms with 
^^over 6 cows^^ have a higher rate than smaller farms with “6 cows 
or less.'' The rates are, 2.5 and 2.8. For medium size farms, the 
rates are reversed. They are 2.8 and 1.7. Do more cows increase 
the income on some farms and decrease it on others? If so, what 
are the reasons and does this table reveal them? It does not. 
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TABLE 25 
SUMMARY TABLE 

SUMMAEY OF THE INTERRELATIONSHIPS BETWEEN THE ThREE INDE- 
PENDENT Variables op Wheat Yield, Number of Cows and Size 
OF Farms and the Dependent Variable of Rate op Income for 


85 Oklahoma Farms, 1939-1940 
h 



Wheat Yields 



14 Bu. and 

15 to 21 

22 Bu, and 

Totals 


Under 

Bushels 

Over 

6 cows or less 

2.8 

6.4' 

9.0 

5.4 

Under 200 acres 

2.5 

5.9 

8.1 

4.7 

200 to 320 acres 

2.8 

6.9 

10.0 

5.9 

Over 320 acres 1 

4.1 

9.1 

8.2 

6.9 

Over 6 cows 

3.2 

5.9 

10.8 

6,8 

Under 200 acres 

2.8 


8.5 

3.8 

200 to 320 acres 

1.7 

4.4 

10.5 

6.6 

Over 320 acres 

4.7 

8.8 

1 

11.8 

8.8 

Average of Totals 

2.9 

6.2 

10.0 

5.9 


LARGE SAMPLES FOR LARGE TABLES 

It will be noted that in Table 25 there is one cell, or location, 
which is vacant. There were no farms in this sample with less 
than 200 acres and a yield of 15 to 21 bushels that had over 6 cows. 
Why does this vacancy occur? This sample of 85 farms is too small 
to he spread over so large and complex a table. Tabular analysis is 
definitely a large sample analysis. It is not effective with small 
groups of data. In order to work out tables such as Table 24 and 
Table 25 with dependable results there should be 200 or more 
items in the sample, 400 or 500 would be still better. In one cell 
of Table 24 there is a clear vacancy. Another cell has only one 
item. Three more cells have only two items. Only two cells, 
the first and second, have sufficiently large samples to be de- 
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TABLE 26 
DATA TABLE 

Data ok Total Population, Number of Non-Farm Families 
AND Total Income in Thousands of Dollars 
FOR Alabama by Counties, 1930 * 



Total 

No. of 

Total 


Total 

No. of 

Total 

County 

Popula- 

Non-Farm 

Income 

County 

Popula- 

Non-Farm 

Income 


tion 

Families 

(000) 


tion 

Families 

(000) 

1 

19,694 

1,294 

$ 4,115 

35 

45,935 

4,636 

$ 12,126 

2 

28,289 

3,721 

7,153 

36 

36,881 

2,129 

6,621 

3 

32,425 

2,900 

5,891 

37 

431,493 

100,974 

286,478 

4 

20,780 

2,573 

4,721 

38 

18,001 

766 

3,505 

5 

28,020 

1,472 

5,626 

39 

41,131 

3,713 

10,958 

6 

20,016 

1,462 

3,365 

40 

26,942 

703 

5,171 

7 

30,195 

2,807 

5,920 

41 

36,063 

4,566 

9,101 

8 

55,611 

8,663 

20,981 

42 

36,629 

1,694 

8,112 

9 

39,313 

3,955 

10,058 

43 

22,878 

751 

3,644 

10 

20,219 

501 , 

4,000 

44 

27,103 

1,579 

4,771 

11 

24,579 

1,437 

4,983 

45 

64,623 

6,867 

18,553 

12 

20,513 

1,628 

4,202 

46 

36,426 

2,443 

8,347 

13 

26,016 

2,300 

5,243 

47 

25,967 

1,530 

5,328 

14 

17,768 

684 

3,030 

48 

39,802 

1,980 

9,297 

15 

12,877 

597 

2,006 

49 

118,363 

25,973 

58,026 

16 

32,556 

1,966 

5,505 

50 

30,070 

1,885 

5,711 

17 

29,860 

3,805 

9,163 

51 

98,671 

19,032 

50,942 

IS 

25,429 

1,857 

4,184 

52 

46,176 

5,356 

15,386 

19 

12,460 

633 

2,070 

53 

26,385 

1,565 

5,187 

20 

41,356 

4,105 

10,294 

54 

24,902 

1,238 

4,935 

21 

23,656 

1,475 

4,785 

55 

32,240 

2,737 

6,707 

22 

41,051 

1,643 

9,345 

56 

26,861 

1,579 

5,225 

23 

23,175 

1,764 

4,427 

57 

27,377 

2,435 

4,432 

24 

55,094 

5,999 

15,866 

58 

24,510 

2,445 

6,169 

25 

40,104 

1,552 

8,339 

59 

27,576 

3,096 

6,971 

26 

34,280 

2,131 

7,385 

60 

26,929 

1,542 

5,354 

27 

27,963 

3,371 

6,532 

61 

45,241 

4,867 

11,818 

28 

63,399 

9,658 

25,947 

62 

31,188 

3,090 

8,868 

29 

18,443 

1,004 

4,408 

63 

64,153 

9,069 

22,914 

30 

25,372 

1,859 

5,925 

64 

59,445 

8,955 

17,031 

31 

30,104 

1,968 

5,590 

65 

16,365 

1,446 

3,188 

32 

19,745 

1,149 

3,662 

66 

24,880 

1,205 

4,395 

33 

34 

26,265 

22,820 

1,236 

1,194 

4,897 

4,507 

67 

15,596 

936 

2,603 

Total 

2,646,248 

317,146 

$872,000 



* Source; “Income in Counties of Alabama, 1929 and 1935,” by W. M. Adam- 
son, Bureau of Business Research, School of Commerce and Business Administra- 
tion, University of Alabama. 
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pendable. If there had been 500 farms instead of only 85, the re- 
sults would have been much more dependable and useful. 

Second Illustrative Problem 

It is difficult to bring out all phases and possibilities of the 
method of tabular analysis in one problem. This second illustra- 
tion is based on Alabama data by counties on (1) total popula- 
tion, 1930, (2) number of non-farm families, and (3) total income. 
The data are socio-economic with a level of living significance. 
The method of organizing them is somewhat different from that 
used on the farm data above. 

PROBLEM OF CLASS INTERVALS 

An important part of the method of tabular analysis is the 
choice and manipulation of class intervals for the independent 
variables. The fewer the classes the easier is the problem but the 
less adequate are the results. In the previous problem we used no 
more than three classes for any variable. In this study we shall 
use four. Table 27 illustrates one possible set-up. 

TABLE 27 
SUMMARY TABLE 

SUMMAEY OF RELATIONSHIPS BETWEEN SiZE OF POPULATION BY COUNTIES 

AND Per Capita Income by Counties for Alabama, 1930. 

Based on Irregular Frequencies 


Total Population ; 
Class Intervals 

/ 

Total Incomes Per 
Class Interval 
(1000) 

Per Capita Income 

10,000- 49,999 

58 

$355,262 

$217 

50,000- 89,999 

6 

121,292 

335 

90,000-129,999 

2 

108,968 

502 

130,000-up 

1 

286,478 

664 

Totals 

67 

1872,000 

$330 


In Table 27 the first three class intervals for total population 
are equal and the fourth is open end. The frequency distribution 
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is very uneven, ranging from 58 counties in the lowest class to 
only one in the highest class. In spite of these irregularities, this 
summary table reveals important information about Alabama in- 
come by counties. On an average, the larger the total popula- 
tion of a county is the higher is its per capita income. The rate 
for the largest county is more than three times that of the average 
of the lowest 58 counties. One difficulty with this method of 
setting up class intervals is that in most economic and social data 
the lower classes contain most of the items. In this case, a long 
open end class has to be resorted to in order to avoid several 
vacant classes. 


UNIFORM CLASS FREQUENCIES 

By dividing the number of cases into terciles, quartiles, quin- 
tiles, or other equal distributions, this uneven frequency dis- 
tribution may be avoided. The possible divisions up to 10 are: 


Name Number of Classes 


Halves 2 

Tertiles 3 

Quartiles 4 

Quintiles 5 

Sextiles 6 

Septiles 7 

Octiles 8 

Noniles 9 

Deciles 10 


Of course, even a larger number of classes could be used, but as 
was indicated above, such a division tends to make the problem 
too detailed and complex and requires too large a sample. For 
these reasons, halves, tertiles, and quartiles are used most fre- 
quently with occasionalty a quintile. The more numerous divi- 
sions are rare. 

In addition to the difficulty of mere size involved in a five or 
six class-interval table there is the problem of analyzing the 
relationships revealed and of computing the measures of error 
required to indicate the significance of the results 
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TABLE 28. TWO-WAY ORGANIZATION TABLE 

Oeganization oe Data on Total Population, Number op Non- 
Farm Families and Total Income to Show the Interrelation- 
ships Among Variables by Counties for Alabama, 1930 


12,000 to 23,999 Population per County 


500-1449 Non- 
Farm Families 

1450-1966 Non- 
Farm Families 

1967-3799 Non- 
Farm Families 

3800-101,000 

Non-Farm 

Families 

Pop. 

Inc. 

Pop. 

Inc. 

Pop. 

Inc. 

Pop. 

Inc. 

19,694 

20,219 

17,768 

12,877 

12,460 

18,443 

19,745 

22,820 

18,001 

22,678 

16,365 

16,596 

$4,115 

4,000 

2,030 

2,006 

2,070 

4,408 

3,662 

4,507 

3,505 

3,644 

3,188 

2,603 

20,016 

20,513 

23,656 

23,175 

$3,365 

4,202 

4,785 

4,427 

1 20,780 

4,721 



317,657 

$40,738 

87,360 

16,779 

20,780 

4,721 




$187 


$192 


$227 




24,000 to 27,999 Population per County 


600-1449 Non- 
Farm Families 

1450-1966 Non- 
Farm Fanoiilies 

1967-3799 Non- 
Farm Families 

3800-101,000 

Non-Farm 

Families 

Pop. 

Inc. 

Pop. 

Inc. 

Pop. 

Inc. 

Pop. 

Inc. 

24,679 

26,265 

26,942 

24,902 

24,880 

$4,983 

4,897 

5,171 

4,935 

4,394 

25,429 1 

25,372 

27,103 

25,967 

26,385 

26,816 

26,929 

$4,184 

5,925 

4,771 

5,328 

5,187 

5,225 

5,354 

26,016 

27,963 

27,377 

24,510 

27,576 

$5,243 

6,532 

4,432 

6,169 

6,971 



127,568 

$24,380 

184,046 

35,914 

133,442 

29,347 




$191 


$195 


220 
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TABLE 28 (continued) 


28,000 to 39,999 Population per County 


500-1449 Non- 
Farm Families 

1450-1966 Non- 
Farm Families 

1967-3799 Non- 
Farm Families 

3800-101,000 

Non-Farm 

Families 

Pop. 

Inc. 

Pop. 

Inc. 

Pop. 

Inc. 

Pop. 

Inc. 

i 


28,020 

32,558 

36,929 

30,070 

$5,626 

5,505 

8,116 

5,711 

28,269 

32.425 
30,195 
34,280 
30,105 
36,881 

36.426 
39,802 
32,240 
31,168 

$7,153 

5,891 

5,920 

7,385 

5,590 

6,621 

8,347 

9,297 

6,707 

8,868 

29,860 

36,063 

39,313 

$9,163 

9,101 

10,058 



127,577 

24,958 

331,790 

71,779 

105,236 

28,322 




$195 


$216 


$269 


40,000 to 450,000 Population per County 


500-1449 Non- 
Farm Families 

1450-1966 Non- 
Farm Families 

1967-3799 Non- 

Farm Families | 

1 

1 

3800-101,000 

Non-Farm 

Families 

Pop. 

Inc. 

Pop. 

Inc. 

Pop. 

Inc. 

Pop. 

Inc. 



41,051 
40,104 1 

$9,345 

8,339 

41,130 

$10,958 

55,611 

41,356 

55,094 

63,399 

431,493 

64,823 

118,363 

98,671 

46,176 

45,241 

64,153 

59,445 

$20,981 

10,294 

15,866 

25,947 

288,478 

18,553 

58,026 

50,942 

15,380 

11,818 

23,914 

17,031 



81,155 

$17,682 

41,130 

10,958 

1,189,760 

569,362 


1 


$218 


$266 


$478 
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Method for Determining Uniform Class Frequencies 

The first step is to decide whether the class shall be based on 
halves, tertiles, quartiles, or quintiles. In these Alabama data, 
quartiles are used. The second step is to make an array of the data 
for each independent variable to be included in the organization 
table. The third step is to divide the array into four equal parts as 
nearly as possible. Since an odd number of items cannot be divided 
equally by 4, there is a slight but unimportant variation in one or 
two of the frequencies. Since Alabama has 67 counties, the 
quartiles’ frequencies fall as follows: 17, 17, 17, and 16. 68 coun- 
ties would make four equal frequencies of 17 each. The fact that 
the last class has only 16 is quite immaterial. The fact that the 
four frequencies are equal or nearly equal, is of great importance, 

TABLE 29 
SUMMARY TABLE 

The Numbee of Alabama Counties Which Fall in Each 

OF THE FoUK QuaETILES OF ToTAL POPULATION AND 
THE Foue Quaetiles OF Non-Faem Families 


I 


ce 

h 

o 



12,000 

to 

23,999 

24,000 

to 

27,999 

28,000 

to 

39,999 

40.000 
to 

440.000 

Totals 

500 

to 

1449 

12 

5 



17 

1450 

to 

1966 

4 

7 

4 

2 

17 

1967 

to 

3799 

1 

5 

10 

1 

17 

3800 

to 

101,000 



3 

13 

16 

Totals 

17 

17 

17 

16 

67 
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because it makes the dependability of the sample averages equal, 
or nearly equal. 

After the quartile limits and quartile frequencies are located the 
organization of the table proceeds on the same principles that 
were laid down in the farm data problem above. The several 
sets of class intervals are located in their proper relations to each 
other, as is shown in Table 28. The number of sub-classes for 
data items is determined by the number of cross relationships, 
which it is desired to show in the summary tables. In Table 28 
only two data sub-classes are used, (Pop.) and (Inc.). These 
two are suiS&cient to give us Summary Tables 29, 30, and 31. If 
a third data sub-column for the figures on Non-Farm Families 
had been included, an additional series of summary tables would 
have been possible, showing relationships between (1) Total Popu- 
lation and Non-Farm Families, and (2) Per-capita Income and 
Non-Farm Families. The statistician is always at liberty to limit 
or expand the number and analysis of summary tables to suit 
his purpose. 

Table 29 is a very simple device showing the distribution of the 
counties of Alabama according to income among the class inter- 
vals of both Total Population and Non-Farm Families. This 
table reveals the closeness of the relationship between (1) Income 
and Population, and (2) Income and Non-Farm Families. If the 
data fall in a narrow diagonal band across the table, the rela- 
tionship is high. The broader the band the lower the relation- 
ship. Since the large figures in this case, 12, 7, 10, and 13, or 42 
out of 67, fall in the four central positions, it is clear that the rela- 
tionship is quite high. 

Table 30 gives a much more detailed and complete analysis 
than does Table 29. It reveals the total population and total 
income figures for each class. It, therefore, gives the true sig- 
nificance of each class and furnishes the basis for a complete 
series of weighted totals and averages for each class and vari- 
able. 

An analysis with the degree of completeness of that of Table 
30 can be obtained only from an organization table that is suffi- 
ciently complex to reveal the desired information. 
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TABLE 30 
SUMMARY TABLE 

The Total Population Figures, and Total Income Figures (in 
$000) Which Fall in each Quartile of Each Variable for Ala- 
bama Data by Counties for Total Population and Number of 
Non-Farm Families 



12,000 

to 

23,999 

24,000 

to 

27,999 

28,000 

to 

39,999 

40.000 
to 

440.000 

Totals 

500 

217,657 1 

127,568 



345,225 

to 






oQ 1449 

CD 

$40,738 2 

$24,380 



$65,118 

% 1450 

87,360 

184,046 

127,577 

81,155 

480,138 

to 






B 1966 

$16,779 

$35,974 

$24,958 

$17,684 

$95,395 

S 1967 

20,780 

133,442 

333,790 

41,130 

527,142 

a to 






^ 3799 

$4,721 

$29,347 

$71,779 

$10,258 

$116,900 

3800 



105,236 

1,189,000 

1,294,996 

to 



t 



101,000 



$28,322 

$569,362 

$597,684 

Totals 

325,797 

445,056 

564,603 

1,312,045 

2,646,248 


S62,238 

$89,701 

$125,059 

$597,304 

$872,000 


^ The top number in each square is the population total for that class 
interval. 

2 The lower number in each square is the total income for that class in- 
terval in thousands of dollars ($000). 

Table 31 is the result of dividing the lower figure (Total In- 
come) in each cell in Table 30 by the upper figure (Total Popu- 
lation) in the same cell. This table is the most complete and final 
of all three summary tables. It shows that (1) per capita income 
increases from $187 in the lowest income range to $478 in the 
highest, (2) per capita income increases from $189 on an average 
in the most rural counties to $461 in the most urban counties, 
(3) per capita income increases from $191 in the counties of 
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TABLE 31 
SUMMARY TABLE 


Per Capita Income by Quartiles of Total Population 
AND Quartiles op Non-Farm Families 
FOR Alabama by Counties, 1930 



12,000 

to 

23,999 

24,000 

to 

27,999 

28,000 

to 

39,999 

40.000 
to 

440.000 

Average 

for 

Lines 

500 

to 

1449 

$187 

$191 



$189 

1450 

to 

1966 

$192 

1 

$195 

$196 

$218 

$199 

1967 

to 

3799 

$227 

$220 

$216 

1 $266 

$222 

3800 

to 

101,000 



$269 

$478 

$461 

Average 

for 

Columns 

$191 

$202 

$221 

$455 

$330 


smallest population to $455 in the counties of largest total popu-' 
lation. From this table it is evident that urbanization is the 
strongest factor influencing per capita income in Alabama. 

Third Sample of Tabular Analysis 

Tabular analysis is not only the most widely used method of 
statistical analysis involving two or more variables whenever large 
samples are available, but it has proved to be especially popular in 
the field of education. This is probably due to the facts that data 
in education are very abundant and are formed in such complex re- 
lationships that more highly mathematical methods often give 
rather artificial results. This is not to say that regression and mul- 
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tiple and partial correlation may not often be used with excellent 
results on educational and psychological data, but that often these 
data yield better results from the more plastic, variable and less 
rigid methods of tabular analysis. Table 32 gives the results in 
summary form for 733 cases of college students showing the rela- 
tionships between the A.C.E. college entrance test scores and the 
grade points received in college. It shows that only 5% of stu- 
dents falling in the lower quartile of the A.C.E. tests excelled in 
college, while 49% had less than a C average. On the other 
hand, 40% of the students in the upper quartile of the A.C.E. 
tests excelled in college and 84% had better than a C average. 
This type of analysis is not as rigid as a mathematical regression 
line or a correlation table, but it does sift the data into appro- 
priate cells of logical relationships which reveal a large number 
of variations in detailed meanings and measurements. 

TABLE 32 
SUMMARY TABLE 

Analysis Stjmmaby of College Aptitude and Achievement Tests 
OF 733 Arts and Sciences Students, Okla. A. & M. College, 
1930“39. Showing Totals and Percentages by Quartilbs * 


A.C.E. 

Grade Point Averages 

Quartiles 

Percentile 

0.0-0.9 

1.0-1.9 

2.0-2.9 

3.0~4.0 

Totals 

0~24 1 

9 

11% 

32 

38% 

38 

46% 

4 

5% 

83 

100% 

25-49 

14 

9% 

49 

31% 

81 

52% 

13 

8% 

157 

100% 

50-74 

12 

6% 

48 

22% 

111 

52% 

42 

20% 

213 

100% 

75-99 

5 

2% 

38 

14% 

125 

44% 

112 

40% 

280 

100% 

Totals, Grade 
Average Groups 

^ 40 

167 

355 

171 

733 


* Source: ^Psychological Tests as Administrative Tools, Dean Schiller 
Scroggs. 
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Computation of Error in Tabular Analysis 

The various methods and formulas for computing the error and 
reliability of the results in tabular analysis are taken up in their 
logical place in the following chapter in which the problems of 
sampling, probability, and error for all large samples are discussed 
in detail. The purpose of the present chapter is only to set up the 
detailed methods of making a tabular analysis. Chapter 14 pre- 
sents the methods of measuring the dependability of the results. 

Advantages of Tabular Analysis 

1. In tabular analysis one is not forced to make any pre-judg- 
ments or assumptions as to the nature of the relationships in the 
data, either as to whether the regression is straight-line or curvi- 
linear, or the relationships are arithmetic or geometric. It usually 
reveals whatever relationship that exists in the form in which 
it occurs, straight-line if they are straight and curved if they are 
actually curved. 

2. It is a much shorter method than complex mathematical 
analysis and is an excellent device for discovering what relation- 
ships do exist before spending the time and money to make a long 
mathematical analysis. It greatly aids in deciding on what 
specific mathematical functions to employ if such analysis is 
later desired. 

Disadvantages of Tabular Analysis 

1. In tabular analysis, no ratio or coeflficient of correlation is 
obtainable. 

2. No least squares regression line is obtainable but a class-to- 
class trend is obtainable. 

3. No measure of partial correlation is available and it is not 
possible to get a full, clear measure of the relative weight of each 
independent variable as it affects the total combined result. The 
relative weight of the independent variables is somewhat ob- 
scured if there are more than two. 

With all of these mathematical limitations, tabular analysis is 
more widely used than any other complex method of statistical 
analysis. 
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SUMMARY 

1. Tabular analysis is not just the making of tables but the use of 
statistical tables to analyze data. 

2. Data tables are simply the associated values of the original data 
in unorganized form. 

3. Organization tables are the arrangement of the original data into 
various combinations of columns and lines to reveal the relationships 
which obtain in the data. They may be designed in any form to suit 
any purpose. 

4. The form of organization table to be used in the analysis should be 
clearly in mind as a part of the planning of the study before the data 
are collected. 

5. Summary tables are brief tabulations of the results revealed in the 
organization tables. They may be carried to any desired degree of com- 
plexity. 

6. Complete cross-classification tables are usually the most effective 
type of table for tabular analysis. The construction of such tables is 
explained in Chapter 7. 

7. In making organization tables it is usually preferable to divide the 
range of data into class intervals which will contain approximately equal 
numbers of items. There may be any number of class intervals for each 
variable, but two, three, or four are the most common. More than five 
classes makes the table too large and complicated for good results. 

8. The reliability of the means of the various classes in the table may 
be computed by the formulas given in Chapter 14. 

9. Tabular analysis does not compel any pre-judgments as to the nature 
of the functions or relationships existing in the data; it simply reveals 
the facts as they are. 

10. Tabular analysis is usually a shorter and less laborious type of 
analysis than mathematical analysis. 

11. Tabular analysis has the defect that it does not supply a coefficient 
of correlation or a mathematical regression line. 
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REVIEW QUESTIONS 

1. What is the difference between tables and tabular analysis? 

2. What is a ^^data table How must it be organized? Explain. 

3. What is an ^^Organization Table On what principles should 
such a table be organized? Explain. 

4. What is the difference between a one-way organization table and 
a two-way organization table? Explain. 

5. What is the difference between a two-way and a three-way organiza- 
tion table? Explain in detail. 

6. What is a summary table? On what factors does its form depend? 
What variety of forms may it take? 

7. Should class intervals be of the same width? Why? Explain. 

8. Should class frequencies be equal or approximately equal or does 
this condition make any difference in the analysis? Why? 

9. How many columns would be required for a two-way four-class 
table that furnished complete sub-class totals? 

10. How many columns would be required for a three-way table with 
sub-totals for one independent and the dependent variables if based on 
four classes for each variable? 

11. Ordinarily how many classes should there be for each variable? 
Why? 

12. What are the advantages and disadvantages of having only two 
class intervals for each variable? 

13. What are the advantages and disadvantages of having six or eight 
class intervals for each variable? 

14. What are the advantages and disadvantages of Tabular Analysis 
as compared with correlation? 

15. For what types of data and problems is it better adapted? Why? 




CHAPTER 14 

SAMPLING, PROBABILITY, 
AND ERROR 


We have now carried our study of statistical methods to a 
point where it is necessary to introduce the theory of sampling, 
probability, and error. We have been dealing with samples from 
the first chapter of this book. We have thrown them into fre- 
quency tables and charts and measured them as to averages, dis- 
persion, skewness, regression, and correlation. Two samples in 
particular, the height and weight of 106 grade-school children, 
and wheat yields on 160 farms were 'purposely carried through all 
these successive computations, that the student might see the analysis 
of a sample developed in detail from a simple tabulation of data to 
the most complex and complete coefficients. By this method the 
relation and unity of the various analyses have been emphasized. 
With this background of method in mind, it is well to ask, what 
is a sample? How may one obtain a good sample? How large 
should the sample be? How may its dependability and adequacy 
be measured? 


UNIVERSE OR POPULATION 

Any complete field of data, such as wheat yields per acre, 
weight and height of school children, stock prices, wages of car- 
penters, tensile strength of steel, production of petroleum wells, 
rainfall, etc., is called a universe or a population. Since a uni- 
verse or population is an entire field of data, it may be infinitely 
large. In most cases it is impossible to measure an entire universe 
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or population. To study the yield of all the millions of acres in 
wheat, or any particular characteristic of all the millions of 
school children, or the tensile strength of all wood, or the tem- 
perature and rainfall of every square mile of the earth’s surface 
would not only be impossible in most cases but also very expen- 
sive in all cases. Such extensive studies would be a foolish waste 
of time and money. Equally accurate results for all practical pur- 
poses may be obtained from a relatively small, properly chosen 
sample with a minimum expenditure of time and money. The 
sampling of large populations is the most economical device known 
to man. To measure thoroughly all important characteristics of 
every bushel of wheat in a carload would cost more than the 
wheat is worth. A small well-mixed sample taken at each end 
and in the middle of the car of wheat will give approximately 
the same results as a total study of every bushel and give it at 
one-thousandth part of the cost. Sampling is literally the basis 
of all human decisions and is the foundation of much scientific 
analysis and knowledge. 

THEORY OF SAMPLING 

The possibility of effective sampling depends on the permanence, 
dependability, and uniformity of the nature and characteristics 
of the population sampled. If all populations varied without 
limit within themselves, no sampling would be valid. It is only 
because we live in a world with relatively fixed characteristics and defi- 
nitely limited variations that we can select a small portion of a popu- 
lation and still be certain that we have a fairly accurate picture of 
that population. Wheat varies to some extent within itself in 
moisture content, weight, protein, etc., but it is always wheat. 
It is never cotton or grass or even oats. There are certain per- 
manent and dependable characteristics of boys and girls which are 
universal and can be depended on to remain constant in a popu- 
lation of adolescents. 

The dependability of samples and the adequacy of the sampling 
method in statistics depend upon at least five permanent char- 
acteristics of numbers. These are: 
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The Inertia of Large Numbers 

If a number of relatively small samples are taken from a large pop- 
ulation, all the samples will be found to be quite similar. This re- 
sult proves that the large body of numbers, or the population, is a 
quite constant body. Its character, its movement, and its inertia 
are not easily changed. 


The Permanence or Persistence of Small Numbers 

If a population contains a variety of characteristics, a little of most 
or all of these several characteristics will be found in any random sam- 
ple of that population. As illustrations of this principle, one may 
point out that if there is a mixture of red corn in a field of white 
corn, a red ear will be found at somewhat regular intervals through- 
out the field. The Mendelian distribution of recessive charac- 
teristics is a clear illustration of this principle. 

Multiplicity of Forces 

We live in a world in which a multiplicity of forces affects each ob- 
servation. Every grain of wheat and every child is the result of a 
vast number of forces of heredity and environment. This fact ex- 
plains why there is so much variation within families, communi- 
ties, groups and samples, and why no sample will be a straight 
uniformity. Variety is the spice of life,^^ is a homely way of 
recognizing this universal characteristic of populations. 

Independence of Forces 

This vast complex of forces affecting all life and existence, although 
related, acts with a large degree of independence on individual observa- 
tions and specimens. This fact explains why the children of the 
same parents, while quite similar in most respects, may vary con- 
siderably in some particular characteristics. It explains why a 
series of samples, while having much in common, may contain indi- 
vidual observations that are quite different. The basic law of 
existence is diversity in unity. 
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Equality of Forces 

These forces are so balanced and related that they tend to generate 
equal values above and below the average. There appears to be no 
long-time bias toward smaller or larger values. If in a particular 
species, such as the horse, the evolution through the ages pro- 
duces a larger animal, it seems to be not that there is a bias toward 
larger animals in the forces of heredity, but that only the larger 
ones actually survive in the difficult environment in which they 
must live. In any case, at any particular time those individual 
cases which fall below the average are about the same in number as 
those which rise above the average. The validity of sampling rests 
upon this universal tendency for the items of a population to 
cluster about the center of the range. They pile up in the middle 
with about equal scatter on either side. Of course, sampling 
procedure is appropriate to non-normal as well as to normal dis- 
tributions. 

Because of this random movement of these forces, it is possible 
by means of a random sample to obtain a reliable and realistic 
cross-section of a population. This is basic in statistics. The 
foundation of all statistical analysis is the representativeness and 
dependability of random samples. 


METHOD OF SAMPLING 

Statisticians generally recognize four different methods of sam- 
pling. The first is the random sample. By random sampling we 
mean the selection of a sample in such a manner that every item 
in the population has an equal chance of being included in the sample. 
There is no bias in true random sampling. No particular class 
or group is barred, minimized or magnified by any means. An 
example of bias is the selection of a sample of a city from its tele- 
phone directory. Most very poor families do not have telephones 
and are automatically eliminated. Persons living in hotels or 
rooming houses may not have a telephone. Transients would 
be excluded. Such a bias is fatal to the adequacy of the sample. 
The failure of the Literary Digest poll on the 1932 Presidential 
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election was caused by such a bias. It failed to include the lower 
economic levels of the population. Bias in sampling does not 
mean that there is a desire to misrepresent the facts. It means 
that there is some hidden and unwanted fault in the sampling 
that makes it unrepresentative of the population. The re- 
searcher must always be on the alert to discover such pitfalls, for 
they invalidate his entire work if they are large, and injure it 
under any conditions. 

There are many situations when one must depend on getting a 
random sample. If our present knowledge of the universe is 
quite limited, if we are unacquainted with the subgroups of the 
population, we can do no more than take the best random sample 
possible under the circumstances. Suppose that we wish a sam- 
ple of the workers on a large defense project. No enumeration 
has previously been made. We do not know whether they are 
young or old, rural or urban, white or Negro, native born or 
foreign born, citizens or aliens, skilled or unskilled. In fact, the 
reason for taking the sample is to discover as much information 
about them as possible. At present we have none. The most we 
can do is to use every precaution to get as good a random sample 
as possible. To do this the sample should not be taken at only 
one place or at only one time; for instance, at noon, or in the 
carpenter shop. The night workers may be quite different from 
the day workers, and the carpenters be very dissimilar to the 
welders. It should be taken from all over the plant during a 
twenty-four-hour period. A certain percentage or number of the 
total should be taken, as every fifth, or every tenth, or if the total 
is large, every fiftieth man, or one percent, or one-half, or one- 
tenth of one percent of the total. There should be an equal 
chance to include every one of the workers in the sample. A sim- 
ple illustration would be the selection of red and white marbles 
from a box composed of 50 red and 50 white marbles of equal 
size. By shaking the box thoroughly, selecting one, recording its 
color, replacing it in the box to keep the total constant, shaking 
and drawing again; in a sample of 30 marbles one should have 
approximately half white and half red marbles. In dealing with 
human beings or anything less simple and uniform than marbles, 
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the process is somewhat more difficult, but should approximate 
pure random sampling as nearly as possible. 

A second type is the stratified sample. When little or nothing 
is known about the internal structure of the universe, only gen- 
eral random sampling can be used. Often, however, there is 
available a considerable amount of information about certain 
characteristics of the population. Good illustrations of the for- 
mer type are the crowds present at Independence Day celebra- 
tions, county fairs, general holiday picnics, the Mardi Gras, and 
the like. Little is known beforehand of the composition of such 
populations. Good examples of the latter type are student bodies 
of colleges, the membership of clubs, and the total census of a 
nation and its subdivisions. Whenever such information is 
available concerning a population, it should be used in making a 
sample of the population. By dividing the universe into strata 
according to known facts and sampling each stratum in propor- 
tion to its relative size in the population, not only can a more rep- 
resentative sample be obtained, but also a smaller sample will 
suffice. 

Let us suppose that we are sampling the student body of a 
university to determine the average annual expenditure of stu- 
dents on picture shows. The catalogue or the report of the 
registrar's office will reveal the portion of the student body which 
falls into each of the following groups, freshmen, sophomores, 
juniors, seniors, graduate students, fraternity and sorority mem- 
bers, etc. If the total number of students is 10,000 and we de- 
sign our sample to include 1% of each student group, our sample 
will contain exactly the same proportions of each group. A total 
sample of 100 chosen by this method would probably represent 
the entire student body better than a sample of two or three times 
that size chosen at random without any consideration of the in- 
ternal structure of the population. It is sampling of this type 
which enables the Gallup Poll and Fortune so accurately to measure 
public opinion from a relatively small percentage of the total 
population. If our sample of the university student body was 
not organized in this way, it is possible that we might not get one 
graduate student in a sample of 100, and perhaps very few seniors. 
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The ubiquitous freshmen and sophomores might completely dom- 
inate it. The statistician^ therefore, should remember that whenever 
exact, specific knowledge is available concerning the internal structure 
of any population, he should take advantage of it in designing a 
sample for that population, the sa.mpling within the subdi- 
visions OF THE POPULATION SHOULD BE TAKEN AT RANDOM. 

A purposive sample is one selected to measure some particular 
phase of a population. A definite purpose or result is in mind. 
To attain this result, specific controls are required. Suppose that 
we wished a sample of a student body which would exactly con- 
form to the entire student body on the measure of grade points. 
We would have to know first the number of students having an 
average of 1 grade point, the number having an average of 2 grade 
points, and so on, to the highest level. We would then select 
out of each grade-point level the same ratio of students. If the 
number of students on the four levels were 500, 1 point; 6,000, 
2 points; 3,600, 3 points; and 400, 4 points, and if we chose a 
sample of 100 it would contain 5, 1 point; 60, 2 points; 36, 3 
points; and 4, 4 points, such a sample as far as this one purpose 
of grade points is concerned would represent the entire population 
perfectly. This method has been used to good advantage in 
agricultural statistics where very exact knowledge existed con- 
cerning the population and one specific purpose was in view. 

The fourth method, the stratified purposive sample is a combina- 
tion of the two methods previously discussed, the purposive and 
the stratified. If we selected our sample to show grade-point 
averages according to the strata of the student body according 
to classes, societies, and other internal structures, it would be a 
stratified purposive sample. When it is the purpose of the statis- 
tician to make exact, detailed microscopic studies of a few limited 
characteristics of a universe, and sufficiently exact information is 
available, this method gives excellent results. Of course, within 
the limits of the strata and purposes, the sampling is still to be 
taken at random and without bias. 

In any case a researcher, before he spends any considerable 
amount of time or money on a complex statistical analysis, should 
satisfy himself that the sample is adequate and dependable. 




PROBABILITY 


321 


STATISTICS AND PARAMETERS 

A parameter is a true or correct value of a population or universe. 
The true and exact mean, standard deviation, etc., of a popula- 
tion are its parameters. Perhaps in many cases these values will 
never be known. It may require too much expense or time to 
measure the entire population. There may be even greater diffi- 
culties of structure, method, or technique. They may be estimated, 
however, from samples. A computation made from a sample is n 
statistiCj^ The X, or Sy of samples are statistics. 

Since proper and correct sampling is the basis of all significant 
statistical analysis, the student is urged to give special thought to 
all samples on which he works and to tests and checks on ade- 
quate sampling. The final results of the most elaborate and ex- 
tensive analysis are limited by the quality of the sample and are no 
more dependable than the sample. Statistical analysis is not a 
magic mill into which one can blindly feed sand or chaff and get 
out a stream of pure gold or good wheat. To do good sampling, 
one should be thoroughly familiar with the field of data in which 
he works. Correct mathematical procedures are necessary, but 
they alone are not sufficient. Expertness of knowledge in the 
field studied is equally necessary for worthwhile results. 

Samples increase in dependability in proportion to the square 
root of their size. It takes a sample of 64 to be twice as dependable 
as one of 16. vl6 = 4; = 8. It requires a sample of 400 

to be twice as good, and a sample of 900 to be three times as de- 
pendable as one of 100. ^ , 

Sample 

VTM=10') f 100 

V 4 OO = 20 r ratio of dependability of \ 400 
V^O = 30 J [ 900 


PROBABILITY 

Probability is one of the oldest phases of statistics and was first 
developed as a mathematical theory in relation to games of chance. 
Probability means the likelihood of the occurrence of an event. One 
speaks of the probability of rain today or of snow tomorrow, or 
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the probability of winning the game Saturday, or of making a 
profit on a real estate venture, or of the rise in the price of a cer- 
tain stock or of losing a battle or winning a war. Many of our 
questions have to be answered by ^'It probably wiir^ or ^‘It 
probably won’t. ” In any case the probabihty of the event is the 
likelihood that it will occur. 

The difficulty with tlie use of the term ^'probabilitj” in every- 
day language is th^ which is common in the popular use of 
words. It lacks scientific precision and definiteness. Professoi 
Richard von Mises ^ redefines probability in its relation to statis- 
tics with a scientific precision and clarity which makes this con- 
cept much more useful even to a beginning student. In a simple 
and abbreviated form one may say, the probability of an event is 
the relative frequency with which this event recurs in an indefinitely 
prolonged sequence or series of observations. If for instance in 
throwing a die a great many times, the number 3 came up in -g- of 
all the throws, the probability of getting a “3” from throwing 
the die would be or one out of every six throws. If out of a 
very large number of birth registrations it appears that 52 out of 
every 100 births are males, the probability of a child being a 
male is .52. The probability is the limiting value of the ratio of 
this particular event (birth of a boy) out of all births. 

The probability, the limiting value of the ratio, is established 
objectively by experimentation, or by repeated observations. 
Clearly to establish the limiting, or correct, ratio the number of 
observations must be large. For instance, among ten births 
7 might be girls, but among 1,000 or among 10,000 births the 
correct, or limiting ratio of .52 would certainly appear. Out of 
1,000 births the number of males would be near 520 in a normal 
population. Von Mises calls the long sequence of observations a 

collective.” We have learned to refer to such large fields of data 
as populations. By “collective,” von Mises means a specific and 
clearly differentiated population of definite attributes. If out of 
a large sample taken from a population a certain attribute or 
event recurs a given number of times out of the total possible num- 

^ Mises, Richard von, Probability Statistics and Truth, The Macmillan 
Company, N.Y., 1939. 
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ber of times in the sample, that ratio is its probability. In this 
sense probability is the final or limiting value of a ratio based on a 
large number of observations from a population, or collective.’^ 

This definition of probability is very explicit and is free from 
all subjective and philosophical content. It is a ratio determinable 
by sampling and objective experimentation. In any case the 
probability of an event is the likelihood that it will occur, this 
likelihood having been established as a dependable ratio by 
scientific statistical procedure. 

Degrees of Prohability 

The world in which we live is a complex of probabilities which 
either singly or in combinations are neither equally certain nor 
equally measurable by statistical methods. As samples of the 
degrees of variation and uncertainty in probabilities three classes 
of events will be considered in this study. 

1. The prohability of pure chance events, 

2. The probability of variations in the occurrences in nature, 
physical and biological. 

3. The prohability of variations in human events and in institutions. 

These three classes of probabilities do not cover all cases but 
they do encompass much of the range of varying probabilities. 
The probability of events based on pure chance will be treated 
first. 

Probability of Pure Chance Events 

All chance events may either occur or not occur, or may occur 
in favorable or unfavorable ways. 

If the total possible occurrences are divided into favorable and 
unfavorable events, then 

n = total number of occurrences 

p = number of favorable occurrences 

q = number of unfavorable occurrences 

and p + Q = n and ^ + -2 = i, 
n n 

This simple standard equation of probabilities should be kept 
clearly in mind throughout the following analysis. 
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Perhaps the simplest and most common pure chance experi- 
ence of the average student is the flipping of coins. As two stu- 
dents go to the fountain one will say, ^^Let^s match for drinks.^^ 
^^All right, Idl take heads/^ The coin is flipped and turns up 
tails. The student may say, ^^I’m always unlucky.’’ But in the 
field of pure chance, he canH be. If these two students were to go 
to the fountain every day for a hundred days and match for 
drinks,” and the coin was normal and was fairly thrown the re- 
sults would be about SO-SO. At least in tossing a perfect coin a 
larger number of times approximately one-half of the tosses will 
turn tails and one-half will turn heads. It is not possible to pre- 
dict with any degree of accuracy how any particular toss will 
turn, but it is possible to predict with nearly complete accuracy 
what a thousand fair, free tosses will give. The result would be 
approximately 500 heads and 500 tails. This is the principle of 
the probability of pure chance. The equation is J -f | = 1. It is 
basic in statistical analysis and will be developed more fully later. 

Unequal Probabilities 

Not all probabilities fall into 50-50 or equal favorable and un- 
favorable distributions. The chance that one may draw the queen 
of hearts from a well-shuffled deck of fifty-two playing cards is 
only one out of fifty-two, or The chances are fifty-one out of 
fifty-two that one will not draw the queen of hearts, or ff . 

14.^— 1 

These probabilities equal 1, but they are not equally divided. 

Additions of Probabilities 

The chances that one will draw a queen are four out of fifty-two, 
+ + ^ + = The chances that one will draw 

a black-faced card are ff . The probabilities that he will draw a 
spade are . Single alternative probabilities are additive. 

Multiplication of Probabilities 

The chance that two independent events will occur in unison is 
the product of their individual probabilities. The chance that one 
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will at the same time^ or in immediate succession, draw the king 
and queen of clubs is X -^ == The probability that one 

will draw a club and a spade is J X J == jV* The more complex 
the situation, the greater the number of independent events com- 
bined into one compound event, the smaller is the probability of 
such an event out of the total possible occurrences. This explains 
why one may play cards for hours and even for months without 
getting two hands during this period that are identical. 

These relationships may be further illustrated by the throwing 
of dice. What are the chances, what is the probability, that if 
one throws two ordinary, well-formed, evenly balanced dice, 
A and B, he will get a 12 or a 7 or an 11? The following table indi- 
cates these probabilities. 


Get A B Total 

12 6 6 ^ 

11 5 6 

6 5 ^ 

10 4 6 

6 4 

5 5 3T 

9 4 5 

5 4 

3 6 

^ ^ 

8 4 4 

3 5 

5 3 

2 6 

6 2 

7 3 4 

4 3 

2 5 

5 2 

6 1 

16 ^ 


Get A B Total 

6 5 1 

1 5 

3 3 

2 4 

4 2-^' 

5 4 1 

1 4 

2 3 

3 2 -^ 

4 2 2 

1 3 

3 1 

3 12 

2 1 ^ 

2 11 


Grand Total f| 
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The chances that one will throw a 7 is or one time out of 6. 
That he will throw a 7 or more than a 7 is fg-, and that he will 
throw 11 or more, are A- 

Human beings everywhere are deeply interested in probability. 
What is the probability that it will rain today or tomorrow? that 
I will fail my examination or make a grade? or that my in- 
vestment will pay? or that our marriage will be successful? 
What are the chances that I shall escape from a shipwreck, a 
battle, or be killed in a car accident? What is the likelihood that 
I shall live to be seventy years old? Day by day we seek to know 
the probability of events in which we are interested. In statis- 
tical analysis an understanding and measurement of probability 
are of fundamental importance. In statistics no measure is com- 
plete until its probability j or standard error, has been computed, 

MATHEMATICAL MEASUREMENT OF PROBABILITY 

In order to make the principles and relationships of the proba- 
bility of pure chance events available to the practical statistician, 
it was necessary to describe and measure them in mathematical 
equations. This work has been done by a number of brilliant 
mathematicians^ over a long period of years. Of the various 
equations available two will be employed in this text. They are 
(1) The Binomial Expansion, and (2) The Normal Probability 
Curve, or Normal Curve of Error. 

The Binomial Expansion 

The student has learned in elementary algebra how the quan- 
tity of (a + by may be expanded to any desired power, such as 

(a+6)"=a"+jm”-i6+ 1-6". 

Or if the computation is for {a + by the result is la^ + 2ab + 16^, 
and (a +by = la^ + Zo?b + ZaV + 16®, 

and (a -f hf = la^ -{- 4ca^) -h WW + 4ta¥ + 16^. 

1 Leplace, Pierre Simon, 1749-1827, greatest of French astronomers and 
mathematicians. Also, Gauss, Karl Friedrich, 1777-1855. German mathe- 
matician of the University of Gottingen. See Encyclopedia Britannica, or 
New International Encyclopedia. 
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For any power of (a + h) the coefficients of the several terms ex- 
press and describe the total number of possible chance combina- 
tions for that particular power. For (a + hy these are 1 + 2 + 1 = 4, 
for (a + hy they are 1 + 3 + 3 + 1 = 8, and for (a + hy they are 
1 + 4 + 6 + 4+1 = 16. 'OiieL number of possible chance com- 
binations doubles for each successiye, power. 

Since in an occurrence based on pure chance such as the tossing 
of a coin or the throw of a die, the possibilities of favorable and 
unfavorable events are equal (the coin has an equal chance to 
turn heads or tails), the equation may be written: 

P “ 4? S' ~ O' ~ (i ~ 


Possible Occurrences For Various Numbers of Coins 

If two coins, a and 6, are thrown the total possible combinations 
of occurrences are: 


H H H T T T 
T H 


or (| + 4 )^ = i" + 2(i)(i) + (i)2 
4 + 4 + i = 4 = l- 


If three coins a, 6, and c are tossed the possible results are: 

a h c a b c a b c a b c 

H H H RET T T H T T T 
H T H T H T 

THE E T T 

(4 + 4)^ = (4)^ + 3(4)^(4) + 3(4)(4)^ + (4)^ = 4 + 1 + 1 + 4 = I = i. 

If four coins a, &, c, and d are tossed the total possible results are: 
abed abed abed abed abed 


HERE 


E E E T 
E E T E 
E T E H 
TREE 


E E T T 
E T E T 
T E T E 
T T E E 

T E E T 
E T T E 


T T T E 
T T E T 
T E T T 
E T T T 


T T T T 


(4 + 4 )^ = 




328 SAMPLING, PROBABILITY, AND ERROR 


In like manner this binomial equation of pure chance occur- 
rences may be expanded to any desired powers. 

Binomial Triangle 

The coefficients of the terms of the binomial expanded to suc- 
cessive powers may be recorded in the form of a right triangle 
which materially reduces the labor of computing the successive 
equations. 


TABLE 33 
Binomial Triangle 


Powers 

Totals 

Coefficients 

1st 

2 

1 

1 











2nd 

4 

1 

2 

1 










3rd 

8 

1 

3 

3 

1 









4th 

16 

1 

4 

6 

4 

1 








5th 

32 

1 

5 

10 

10 

5 

1 







6th 

64 

1 

6 

15 

20 

15 

6 

1 






7th 

128 

1 

7 

21 

35 

35 

21 

7 

1 





8th 

256 

1 

8 

28 

56 

70 

56 

28 

8 

1 




9th 

512 

1 

9 

36 

84 

126 

126 

84 

36 

9 

1 



10th 

1,024 

1 

10 

45 

120 

210 

252 

210 

120 

45 

10 

1 


11th 

2,048 

1 

11 

55 

165 

330 

462 

462 

330 

165 

55 

11 

1 

12th 

4,096 

1 

12 

66 

210 

495 

792 

924 

792 

495 

210 

66 

12 1 


To arrange this table (1) set down as left-hand stubs the suc- 
cessive powers desired under the heading of Powers. (2) Write 
down 1 as the first coefficient of each power. (3) Form the next 
column by writing 1 in the first line, and to form the number (co- 
efficient) below it, add to it the number directly to its left, as 
1 + 1 = 2, 2-fl = 3, 3 + 1 = 4, etc. (4) Begin each succeeding 
column one line lower with 1 as its first number and obtain the 
successive numbers (coefficients) below it by adding to each num- 
ber in that column the number in the same line immediately to 
its left. Begin the third column opposite Power 2 in the second 
line with 1. To get the 3 below it add the 2 beside it to the 1, as 
1 -|- 2 = 3, then 3 + 3 = 6, and 6 + 4 = 10, and 10 + 5 = 15, and 
so on for all other coefficients. By this method the student can 
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quickly obtain whatever set of coefl&cients he may require for a 
given frequency distribution. The column headed Totals next to 
the column headed Powers contains the total of all coefficients for 
that power. They double for each successive power. 

WORKSHEET NO. 44 

Comparison of Theoretical and Ac- 
tual Distributions of Throws 
OF Twelve Coins (Pennies) 


4,096 Times 

Number of 

Frequency of Throws 

Heads up 
in Throw 

Theoretical 

Actual 

Number 

Number 

0 

1 

1 

1 

12 

15 

2 

66 

61 

3 

' . 210 - 

205 

4 

495 

506 

5 

792 

783 

6 

924 

926 

7 

792 

805 

8 

495 

504 

9 

210 - » 

212 

10 

66 

60 

11 

12 

16 

12 

1 

2 

Totals 

4,096 

4,096 


These two frequencies are plotted on Fig. 63. It should be 
observed that the binomial expanded to the 12th power does not 
give a continuous smooth curve, but only thirteen points on such 
a curve. If, however, the number of coins and throws w^ere in- 
creased to a very large number these plotted points would become 
more numerous and closer together, and as the number of coins 
and throws continued to increase the curve would ultimately ap- 
proach a smooth continuous line. Only with this reservation, 
that the sample be infinitely large, that is, that it approach the 
limits of an infinite population, can the binomial expansion ac- 
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tually be a full and accurate measure of the probabilities of pure 
chance events. However, for practical purposes an ideal binomial 
expansion frequency curve may be fitted to a finite or limited 
sample to discover whether and how near the sample has the form 
of a true binomial curve of a given power. It is a useful device for 
describing and measuring a frequency distribution. Such an 
equation is called the point binomial, because it gives only a 
finite or limited number of points and not a smooth curve. 

960 
880 
800 
720 
640 
560 
480 
400 
320 
240 
160 
80 

01 23456789 10 11 12 

Fig. 63. Comparison of theoretical and actual frequencies 
of 12 coins thrown 4,096 times, as shown in Worksheet 
No. 44 

THE LAW OF PURE CHANCE OCCURRENCES 

From the relationship revealed in the equations above and illus- 
trated in Fig. 63 the following generalization may be made. 

Variate values of pure chance occurrences tend to he distributed 
about their mean approximately in proportion to the values of coeffi- 
cients to the nth power of a suitable binomial. 




THE NORMAL CURVE 


331 


The twelve-sided polygon shown in Fig. 63 would change to a 
figure of more and shorter sides as the powers increased until at 
infinity it would constitute a perfectly continuous smooth curve. 


THE NORMAL CURVE 

This perfectly smooth curve which results from the expansion 
of n, the power of the binomial "to infinity is called the 

normal curve, or the normal probability curve, or the normal curve 
of error. It is sometimes referred to as the Gaussian curve or the 
Laplacian curve from the names of two prominent mathematicians 
who contributed to its development. 

The equation of this curve may be written in several forms the 
following one of which is perhaps the simplest and most useful 
for the elementary student. 

Meaning of symbols: 

Yc = the computed height of an ordinate at a distance x from 
the arithmetic mean ^ - "• 

i = class interval 

(T — standard deviation of the sample 
TT = the constant, 3.1416; = 2.5066 

e = the constant, 2.71828, natural log base 
X == any selected deviation from, the arithmetic mean 
N = number of observations in the sample 


Formula No. 54 


„ Ni 

Yc — T= e 2(r2 

crv27r 

If we substitute the two constants, = 2.5066 and 2.71828 
in the equation, it becomes 


Fo- 


Ni 5 ! 

•2.71828 


2.50660- ^ 


At the mean — = 0. 

2cr2 


Fo = 


Ni 


at the mean. 


2.5066<r 

By substituting in this equation Ni, the number of items in the 
sample, times the class interval, the standard deviation of the sam- 




332 SAMPLING, PROBABILITY, AND ERROR 


pie and any particular x, one may compute the ordinate Yc, and 
by varying x may compute as many ordinates as desired. This 
formula is rather complex and it is not expected that the ele- 
mentary student will spend much time with it. Its form is shown 
in Fig. 64. 




Fig. 65. Showing normal curve of error and percentages of 
items included within various ranges of error 


Ordinates of the normal probability curve are shown in Appen- 
dix Table 1. This table is quite useful in fitting normal curves to 
frequency distributions. The normal curve may also be fitted to 
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any given frequency distribution by the use of the formula itself. 
If we make a histogram of the frequency distribution used in 
Worksheet No. 1, we may also plot on this same graph a normal 
curve. The ordinates of the normal curve are obtained as follows; 

Ni 

1. Take the formula Yc = o 2.71828 2 cr 2 

2.5066cr 


2. Select x^s which are some exact simple fractions of the stand- 
ard deviation of the sample, such as etc., and substitute 

them in the above formula successively until sufficient ordinates 
are obtained to outline a smooth curve. 

3. Computations: (7 = 4.7. Take ordinates at 1.175, 2.35, 
3.525, 4.7, 5.875, 7.05, 8.225, 9.4, 10.575, and 11.75, or successive 
ordinates one-quarter of a standard deviation apart on the X- 
axis from the mean. 

If in the equation 


are substituted the values for X and i in the problem on heights 
of school children and x is set equal to one standard deviation, 
4.7 in this case, the solution of the equation becomes 


Yc = 


106 X 2 
2.5066 X 4.7 


( 4 . 7)2 

2.71828 2(4 7)2 


212 

11.78102 


(22 09 ) 

2.71828 2 ( 22 . 09 ) 


= 17.993 X ^ = 18.0 X ■■ 

2.71828^ V2.71828 

= 18.0 X— g4g9 = 18.0 X .60653 = 10.9175 


which is the ordinate of the normal curve distant one cr on either 
side of the mean. By substituting other values for x in terms of 
cr as many ordinates may be computed as desired. 

For X = one-half or, or 2.35 in this case the solution is 
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( 5 . 5225 ) 

= 18.0 X 2.71828 2(22 0900 ) 

= 18.0 X 2.71828~M = 18.0 X 2.71828"-^ 

= 18.0 X ^ = 18.0 X 

2.71828t 2.71828? 

= 18.0 X = 18.0 X .88250 = 15.880 

V2.71828 


which is the ordinate at a distance of one-half cr on either side of 
X in this sample of heights of school children. 

This method of computation of ordinates is so tedious that 
tables of ratios have been developed which greatly facilitate the 
computations. 

If in the equation of the nominal curve x is set to equal zero (0) 
the solution is simplified to 




Ni 

2.5066(r 


( 0 )=^ 

2.71828 2(4.7)2 


( 0)2 


Since any quantity at a power of zero equals 1, or 2.71828~2(4.7)2 
becomes one (1), the maximum ordinate for any distribution may 
be computed from the formula 




Ni 

2.5066<t' 


which in this case becomes 


212 


11.78102 


= 17.993 = 18.0 


By selecting the ratios in Appendix Table I for desired values of j 

such as 1 1^75 2.35 . 4.7 , . , 

^ >j -25, ^ ^ .5, ^ y 1.0, etc,, 


the student may quickly compute all the desired ordinates, by 
multiplying the maximum ordinate by the desired ratio in Appen- 
dix Table 1. For this sample of heights of school children they 
are shown in Worksheet No. 45. 

The superimposed normal curve must be centered at the mean 
of the histogram on frequency polygon of the sample as is shown 
in Fig. 66. 
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Computation of Ordinate of Normal Curve at Intervals 
OF .25(7 from Mean. Based on Appendix III 


Distance from 
Mean in .25 <t 


Ratio to Ordinate 
at Mean 


Mean Ordinate 


Ordinate in 
Terms of Data 


.00 

1.00000 

X 

18 

= 

18.0000 

.25 

.96924 

X 

18 

= 

17.4463 

.50 

.88250 

X 

18 

= 

15.8850 

.75 

.75484 

X 

18 

z=: 

13.5871 

1.00 

.60653 

X 

18 

= 

10.9175 

1.25 

.45783 

X 

18 

= 

8.2409 

1.50 

.32465 

X 

18 


5.8437 

1.75 

.21627 

X 

18 


3.8929 

2.00 

,13534 

X 

18 

= 

2.4361 

2.25 

.07956 

X 

18 

= 

1.4321 

2.50 

.04394 

X 

18 

= 

.7909 

2.75 

.02280 

X 

18 


.4104 

3.00 

.01111 

X 

18 

= 

.1999 



Fig. 66 . Histogram of frequency distribution of heights of 
106 Stillwater, Oklahoma, grade-school children and super- 
imposed normal probability curve 

335 
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The Normal Curve with Physical and Biological Data 

The normal curve as explained in the previous sections of this 
chapter deals with occurrences based on pure chance. The logical 
principles underlying this curve are: 

1. The forces affecting individual events are independent of 
each other. 

2. The forces affecting individual events are many and of ap- 
proximately equal weight. 

3. These forces operate so as to produce deviations from the 
mean that are about equal in size and number. 

In the realm of pure chance these statements are correct, and 
fortunately they also apply with a large degree of accuracy in the 
physical and biological fields. The leaves of a tree ordinarily 
vary in length approximately according to these principles. 
There will be a few very short leaves and very long leaves. As 
the lengths of the leaves approach the mean their numbers in- 
crease according to the frequencies of the normal curve. The 
heights of trees of the same variety and age and location and the 
heights of mature men of the same race as well as the size and 
weights of animals of the same species and age as well as most 
other natural phenomena tend to conform to the normal curve. 
This conformity of natural phenomena to the normal curve of 
pure chance occurrences is so close that the general law may be 
stated as 

Variate values of a natural phenomenon tend to he symmetrically 
distributed about the mean in proportions conforming to the law of 
chance distributions} 

This similarity between pure chance occurrences and the dis- 
tribution of most natural phenomena widens the scientific and 
mathematical application of statistical analyses to a large portion 
of the physical and biological universe. Since the same three laws 
of (1) independent forces, (2) numerous and equal forces, and 
(3) equal deviations from the mean obtain and operate in the 

1 Jerome, Harry, Statistical Methods, Harper & Brothers, New York, 
1924 , p. 150 , 
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physical and biological fields there is a sound logical basis for 
expecting such phenomena to conform to the normal curve. The 
wide prevalence of these basic relationships in all these fields lays 
the foundation for logical statistical inferences from samples in 
these fields. In fact, wherever these laws apply statistical measures 
are more than descriptive devices. Under such favorable conditions 
statistics become the secure bases of scientific inference and fore- 
casting. Out of such studies science is built. 

Analysis of Normal Curve 

Within the normal probability curve there exist certain fixed 
ratios, relations, proportions and distributions which may be 
stated in standard statistical and mathematical terms and uni- 
versally applied with a high degree of accuracy for all distribu- 
tions of data approximating the form of the normal curve. 

Relationships in Normal Curve 

1. The arithmetic mean, median, and mode coincide. 

2. The standard deviation cuts the curve at the points of in- 
flection (the points on the steep slope where the curvature reverses). 

3. The first and third quartiles are equally distant from the 
median. 

4. Within one standard deviation plus and minus from the 
mean, 68.268% of the items fall. 

5. Two standard deviations taken plus and minus from the 
mean contain 95.45% of the items. 

6. The average deviation is .7979 of the standard deviation. 

7. The semi-interquartile range equals the probable error, which 
equals .6745 of the standard deviation. 

For any data which fall approximately in the form of the normal 
curve and are correctly based on the three logical principles under- 
lying the normal curve, the seven ratios stated above will hold 
true and may logically be deduced from the samples as applicable 
to the population from which they are taken. 

For example, one may logically expect these ratios to hold as to 
variations in the heights of men, length of tree leaves, sizes of 
grains of sand, and velocity and forces of molecules of gas striking 
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the walls of the container. They would also apply to any rela- 
tionship in the social sciences, economics, psychology, sociology, 
and education in which it may be established that the basic rela- 
tionships of the normal frequency curve apply. 

PROBABILITY OF VARIATION IN HUMAN BEHAVIOR 
AND IN SOCIAL EVENTS 

The major difficulty in applying the analysi-s of the normal 
curve to social data is that in this field the three basic logical 
principles on which the normal curve is based are not entirely 
true. A coin or a die when freely thrown has an equal chance to 
fall either way, but in human and social relationships one is not 
fully free either in heredity or environment. The children of the 
mentally deficient do not have a full free chance to be mentally 
normal or superior. They are to some extent bound by the genes 
of their parents and are more likely to be like them than different 
from them, Like begets like. The children of poor parents who 
lack education and culture are more likely to lack culture and 
education, even in a democracy, than the children of well-to-do 
parents. Culture^ too^ tends to perpetuate itself. Instead of a child^s 
being as free as a thrown penny to turn whichever way it falls, it 
is held and biased in its life by its heredity and environment to 
some extent, perhaps in many cases to a large extent. The same 
principle affects groups of persons from the local communities to 
the state and nation. Certain biases, traits, characteristics, 

bents, prejudices, handicaps, and peculiarities tend to persist 
and to determine the lives of many succeeding generations. 

These traits and biases tend to crystallize into customs, tradi- 
tions, and institutions which are imposed on children from birth 
and mold and color their thinking, characters, and lives in spite 
of even strong individual tendencies to ^^fall” some other way. 

These forces affecting human beings are numerous, hut they are 
often neither independent nor evenly balanced. They frequently do 
not tend to produce variations of equal extent and weight on either 
side of their mode. Their mode, median, and mean do not coincide, 
neither are their quartiles equally distant from their median. In 
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other words they are not normal but are definitely skewed either 
to the right or to the left. The sons and daughters of professional 
families tend to become professional people. The children of day 
laborers tend to become day laborers. The sons of planters tend 
to become planters and the sons of share croppers tend to follow 
in their fathers^ economic status. 

Three factors make it much more difficult to generalize from 
the normal curve in the social sciences than in the physical and 
biological fields. 

1. The data are much more complex and variable. 

2. The data are usually skewed to a marked degree. 

3. The complexity, variability and skewness of the data are not 
uniform and permanent but are subject to trends, cycles and com- 
plex shifts. 

These conditions make dependable research more diflB.cuIt in 
the social science field than in the physical and biological domains; 
but the student must not conclude that statistics are useless in 
social studies, because it is here that they are most necessary and 
relatively fruitful. It simply means that in this more difficult 
field the statistician must be more careful and better equipped for 
his work. He must become a master of both statistical techniques 
and subject matter. 


STATISTICAL INFERENCE 

All statistical computations and studies tend to lead to statis- 
tical inferences. One is likely automatically to infer that any 
sample represents its population, although the discrepancy may 
be great. It requires a careful and scientifically minded person 
not to jump to the conclusion that any mean computed necessarily 
correctly measures the population. So it is with all other statis- 
tical measures and computations. In any case straight logical 
thinking is necessary. In no case should the conclusions contain 
more than the sample and premises warrant. Because in the physi- 
cal and biological sciences the data do conform in large measure 
to the properties of the normal curve more rigid conclusions can 
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be deduced from less evidence than in the more complex social 
-sciences. Smaller samples suffice and less critical and less highly 
refined methods may be used with a high degree of certainty. It 
is easier to measure accurately the length of oak leaves, solar 
radiation, and the distances to the stars than it is a potential 
market, public opinion, labor union behavior, or the business 
cycle. In the latter cases the conclusions must rest on larger sam- 
ples hedged about by more exhaustive analyses and computations 
of error. During the past forty years great refinements have 
been made in social statistics. These are presented in this text in 
Part II and Part III in Large Sample Analysis” and ^^Time 
Series Analysis.” 

In the social sciences the normal curve is still the best basis for 
statistical inference, but it must be used with greater care and hedged 
with su'perior techniques for computing error. If the data are 
skewed too much to employ the arithmetic mean, the geometric 
and harmonic means are available. 

Both of these means give data which are skewed far to the 
right, the form of a normal curve and make their logical analysis 
on a ratio or percentage basis quite accurate and dependable. 
Multiple and partial correlations have contributed much to the 
analysis of social data. Multiple and curvilinear regressions also 
have been quite useful. When sufficiently large samples are 
available, tabular analysis is effective and highly flexible. Factor 
analysis and other more complex methods are being gradually im- 
proved or perfected, and are quite applicable to social data. 

In the social data fields one of the largest problems is the selec- 
tion of an adequate sample and the computation of its various 
measures of error. It is in fact in these two phases of statistical 
work (1) adequate sampling and (2) the computation of errors 
that the statistician in the social sciences must be especially 
alert and proficient. If he is so, he may carry many logical and 
dependable inferences from his sample to its population. The 
rapid advancement in the statistics of economics, education, 
psychology, sociology, and business administration during the 
two past decades has demonstrated these possibilities in all of 
these fields. The basic principles of statistical inference are the 
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same here as elsewhere but they must be applied with more com- 
plex and adequate methods. 


ERROR 

The primary meaning of error is deviation or diff erenceSj, Even 
a sociaTerror is only a deviation from the accepted mode of be- 
havior. If the norm is fixed and accepted without question as 
the relations of mathematics are, error comes to mean mistake. 

If the product of two times two is stated as five, or anything other 
than four, the computer is said to have made a mistake, or am 
error. Error is not used in statistics in the sense of mistake, but 
is used only in the sense of difference or deviation. In computing 
the statistics of a sample as a means of measuring the parameters 
of a population, there is always the certainty that the statistics 
and the parameters will not be identical. If the exact mean of 
the heights of all the men over twenty years old in the world is 
67.437 inches, it is most likely that no sample of this population 
would yield exactly an identical statistic. This deviation between 
the parameter and the statistic is the error, or deviation. 

Standard Error of the Mean 

If one compu.t esjtiliejDaean of a random sample taken from a pop- 
ulation, the jquestion^ always arises^ How near does the mean of 
the sample (a statistic) conform to the true mean of the universe 
(a parameter)?^ How can this question be answered? If we can - 
never knqwThnjparameter, how can we know how far the statistic 
misses it? Let us take another sample from the same population 
and compute its mean. Will the means of the two samples"T>e 
identical? tlsually not. Then the question arises as to which 
mean is better. Neither one is absolutely identical with the 
parameter. Let us talc e ot her samples and compute a large num- 
ter of means._ If -diis is done, it will be found that these. m.e^s 
tend t o fall i nto a. small frequency distribution approaching a 
normal curve of error in shape. The means of thirty samples^! 
26 items each from the heights of the grade school children were 
computed with the following results: 
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TABLE 34 



Means of 

Heights 

OP School 

Children 


52.8 

52.7 

51.1 

52.0 

51.6 

51.8 

52.2 

52.1 

52.7 

53.1 

52.5 

51.8 

52.1 

53.4 

53.7 

52.3 

53.2 

52.4 

52.7 

53.4 

52.6 

52.9 

52.4 

52.2 

52.1 

52.5 

51.8 

53.4 

52.2 

51.5 


Feequency Distribution of Means 


Class Intervals Frequencies 

51.0- 51.4 1 1 

51.5- 51.9 4441 5 

52.0- 52.4 4441 4441 10 

52.5- 52.9 4441 111 8 

53.0- 53.4 .4441 5 

53.5- 53.9 1 J. 

30 


The question still remains, What is the true mean (parameter) 
of the universe? We can never know the exact parameter, but 
we have discovered the method of closely approximating Jt. 
Since a mean is the most representative number in a group, the 
mean of these 30 means would likely be nearer the true parameter 
than any single one of them. The mean of the means 


2Z 1573.2 
N 30 


52.44 


The standard deviation of these 30 means is 0.62 inches. This 
result indicates that if we continued to compute other means 
from similar samples, 68 out of every 100 of such means would fall 
between 52.44 =fc .62 or 51.82 ~ 53.06. While we have not lo- 
cated the exact parameter, we have set up the limits within 
which it, no doubt, falls.^ With a considerable degree of accuracy 
we have measured the error of our statistics, the error of the mean. 
This method, however, involves a great deal of labor. The com- 
putation of 100 or 50, or even 30 means and their common mean 
is no small task. The mathematicians have evolved a shorter 
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method that for all practical purposes is as good as that used 
above. It is called the standard error of the mean. It is com- 
puted as follows: 

1. Take one adequate random sample from the population. 

2. Compute its X and 


Formula No. 55 


(Tx = 


(Tx 


In this case the values are: 


CTx = 


4.3 

V26- 1 



The value by this method, (.86), is somewhat larger than that 
obtained by computing the standard deviation of the 30 means, 
but if the size of the samples were greater, the two values would 
be closer together. For our larger sample (106 items), the 

4.7 4.7 

cr- = = .46. 

Vl06 - 1 10-25 


This is a short and adequate method for computing the reliability 
of a mean. The computation of no mean is complete without the 
computation of its standard error. For the mean of 30 means, it is 
X ± cr^ = 52.44 zb .46 = 51.98 ~ 52.90. This is locating the mean 
of the population quite accurately. If we wish to be more accurate, 
we take X zb 20 * 5 , as 52.44 zb .92 which is 51.52 — 53.36. This 
measure sets the limits within which 95 out of 100 possible means 
from various samples would fall. 

Since there is no assurance that any particular sample mean 
will fall near the center of the frequency distribution of many 
sample means, it is necessary to include 20*5 or even Bcr^ to be 
certain of reaching the population mean. If we take the largest 
mean, (53.7), of the thirty means computed above as an example, 

53.7 - .86 = 54.56 — 52.84, we find that Icr^ does not include 
the mean of the thirty means, or 52.44. It would require 

53.7 - 2crx.86 or 53.7 — 1.72 which is 55.42 — 51.98 to reach be- 




344 SAMPLING, PROBABILITY, AND ERROR 


yond the mean of the thirty means which is probably nearer the 
true parameter than any other figure we could get. 

On the other hand, if we by chance should get for our one sam- 
ple the smallest one above, (51.1), it would also require 2ax to 
include the mean of means computed above, 

51.1 ± .86 = 51.96 ~ 50.34. 

51.96 is too small to include 52.44. 

51.1 ± 2cr^ .86 or 51.1 d= 1.72 = 52.82 - 49.38. 


52.82 does reach beyond the 52.44 average of the thirty means. 

It is this likelihood that the one sample taken to compute our 
mean and standard error of the mean may be an extreme one 
(either too large or too small) that compels one to use two or even 
three standard errors of the mean to be certain of including in 
its limits the parameter of the population. 

Standard Error of Standard Deviation 

A similar measure may be computed for the standard deviation 
■of a sample as follows: 


Fokmula No. 56 


0-cr=^ 


(Tx 

V2(A - 1) 


4.7 

14.53 


= .32 


This means that in 68 out of 100 samples the standard deviation 
{(Tx) would fall within a range of ± .32 of 4.7, or with this sample 
between .32, or 4.38 — 5.02. The larger the sample the more 
accurate is the measure. 


Standard Error of Coefficient of Correlation 


The dependability of the coefficient of correlation may be com- 
puted as follows: 


Formula No. 57 


1 — r2 

In the case of the correlation between the height and weight of 
school children in Worksheet No. 43, the value is: 
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1 - .6006 .3994 .3994 

V75^ V74 8-602 

which is read .775 rb .043, or .732 — .818. 

This method of computing the standard error of the coefficient* 
of correlation is not valid unless (1) the value of r is small (near 
zero) and (2) the size of the sample is large. As was pointed out 
in the explanation of probability any valid use of a standard error 
rests on the assumption that when any given statistic is com- 
puted for a series of samples from the same population these suc- 
cessive statistics fall in the form of a normal distribution. This 
result cannot occur unless the correlation of the population ap- 
proaches zero. Since the maximum of correlation is either a 
+ 1 or a — 1, the distribution must be highly skewed if r is large. 
If the true r of a population is + .94, the distribution of the r’s 
of 100 samples from this population must be highly skewed to 
the left, because r cannot be more than a + 1.0 but can be a — 1.0. 
On the right hand side the distribution can vary in a range of only' 
.06 or from •-(- .94 to + 1.0, but on the left side it may vary through 
a range of 1.94, or from — 1.0 to + .94. Such a distribution is 
too highly skewed for the relationship of the normal curve to be- 
valid in its measurement. Also if the sample is quite small, the- 
distribution of the r’s will be erratic. 

In such cases, (1) when r is large, or (2) when N' is small, the 
measurement of the significance of r may be obtained by the 
computation of from the formula 

_ 

^ Vl — 

1 The symbol, t, indicates the ratio of a statistical measure which is dis- 
tributed normally around a hypothetical mean usually taken as zero to an 
estimate of the standard error of that measure based on the number of de- 
grees of freedom involved. It has many uses of which that indicated here is 
only one. The point that the student should remember here is that t in- 
dicates a ratio between a statistical measure and its standard error. Fuller ex- 
planation will be given in Chapter 20, which treats of small samples. 

The large T indicates a normal deviate. A normal deviate is the deviation 
of a statistical measurement from the mean of a normal distribution expressed 
in units of its standard deviation. 
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By this method one may compute the probability of r differing 
significantly from zero. 


.94V'27 - 2 

4.70 

13.82 

^ Vi - .942 

.34 

.30V27 - 2 

1.50 

1.54 

^ Vl. - . 30 ^ 

.974 


For a sample of 27 items and r = .96 the highly significant 
Talue of t = 2.787. The computed value of 13.82 is five times 
larger than the required 2.787. But when N = 27 and r = .30 
the computed value of t is only 1.54 which is too small to prove 
that r is significant. 

But when N is large, 

.30^402 - 2 6.00 . _ 

vi-.3(y 

a correlation of .30 is highly significant. The required value in 
this table is 2.576 which is less than one-half of our computed 
value. 


Transformation of r to z 

The ^-test is applicable only for determining whether the com- 
puted r is significantly different from zero. E. A. Fisher has de- 
vised a method for testing the significance (1) between any two 
correlation coeflScients, and (2) between a computed r and any 
theoretical value. 

A simplified statement of the formula is 
2 = 1.15129 Log 10 

If r = .50, 2 = 1.15129 X Log = 1.15129 X Log 

= 1.5129 X Log 3.0 = 1.15129 X .477121 = .5493 
If r = .80, 2 = 1.15129 Log (jAfs) = 1-15129 Log 
= 1.15129 Log 9.0 = 1.15129 X .954243 = 1.0986 
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When r — 0, 2 : = 0, but because z is based on logarithms the z 
values increase at a faster rate than the r-values. 

2 :-values for r-values 


.5 

1.0 

1.5 

2.0 

2.5 
3.0 


.4621 

.7616 

.9052 

.9640 

.9866 

.9951 


As the z values increase at a constant rate the r values increase 
at a declining rate 

0 -values for r-values 


.2025 

.4237 

.6931 

1.0986 

1.4721 

1.8460 

Infinity 


.20 

.40 

.60 

.80 

.90 

.95 

1.00 


As the r-values increase at a constant rate, the 0 -values increase 
at an increasing rate. 

The z values conform to the normal curve 
1 


(TZ = 


is the formula for the standard error of z. 


V7V™3' 

It is not necessary to compute the z values because the z equiva- 
lent for most r-values may be read from Appendix Table III. 

In Chapter 11, the standard error of estimate was presented as 
the means of measuring the accuracy of the estimates of Y from 
X in the regression equations. See pages 247-258.^ 

Difference of Two Means 

In comparing the means of two samples the question arises as to 
whether the two samples could have been obtained from the same 


1 Formula No. 32. Sy = 


o f- 
\l ^ or Sy-yj- 


2(F - Y'y 


N 
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parent population or whether they represent two separate popu- 
lations. If they are from the same population, the means will 
differ no more than the variation that is determined by chance 
random sampling, but if their difference is greater than that 
which could result from chance sampling the samples must be 
considered as having been drawn from two separate populations. 

The method of testing this similarity of sample means is to 
compare the difference of the two means with the square root of 
the sum of their two squared standard errors of their means. 

Formula No. 58 
y Z - X2 

Meaning of symbols: 

T = the difference of two means expressed as a normal 
deviate 

OTjD = square root of the sum of the two squared standard 
errors of the means of the samples 

Formula No. 59 

If T is more than 2.63, the chances are only 1 out of 100 that 
the two samples are from the same parent population. If T is 
only 2.0, the chances are only 1 out of 21 that the two samples 
were drawn from the same universe. Values of less than 2.0 are 
not very dependable. They do not indicate clearly that the 
samples are from separate populations. These ratios apply only 
to large samples of 100 or more items. For smaller samples the 
ratios must be larger. 

Probable Error 

The probable error of a mean is 0.6745 ds . It measures a 50-50 
probability while the standard error measures 68-32 probability. 
The latter is the more desirable measure. 

These are the more common measures of error. Other and more 
complex measures will be taken up in later chapters. 
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SUMMARY 

Sampling 

1. Sampling is the basis of most statistical studies. 

2. The value of statistical studies tends to be limited by the size and 
representativeness of the sample. 

3. Samples tend to improve in the ratio of the square root of their size. 

4. A group of samples from the same population tends to possess and 
exhibit the same characteristics. 

5. All samples should be taken at random. They should be as free 
from all bias as possible. 

6. A stratified sample is one which conforms to certain limited known 
characteristics of the population as to internal proportions; but within 
these proportions is taken at random. 

7. A parameter is the true measure of a population. 

8. A statistic is a computed measure of a sample. 

Prohohility 

1. The probability of an event is the relative frequency with which 
this event recurs in a prolonged sequence or series of observations. 

2. The probabilities of mutually exclusive events are additive. 

3. The probabilities of simultaneous and independent events are 
multiplicative. 

4. Variate values of pure chance occurrences tend to be distributed 
about their mean approximately in proportion to the values of the co- 
efficients to the nth power of a suitable binomial. 

5. The internal relationships of the constants of the normal curve of 
error are useful in indicating and measuring the characteristics of a popu- 
lation that approaches a normal distribution. To the degree that the pop- 
ulation departs from normal the dependability of the inferences declines. 

6. One (Tx diz from the mean includes 68.27 per cent of the items of a 
normal distribution. 

7. Two cTx ± from the mean includes 95.45 per cent of the items of a 
normal distribution. 

8. Social and economic data tend to fall in more highly skewed distribu- 
tions than do data of the physical and biological sciences, and, therefore, 
require more refined treatment. 


Error 

1. Error in the statistical sense is a deviation or difference. It is not a 
mistake. 

2. The standard error is expressed in terms of the standard deviation and 
is applied only to variates having an approximately normal distribution. 
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3. The distribution of the means of a group of samples taken from 
the same population fails in a normal curve. 

4. The means of small samples tend to be more variable and erratic 
than the means of large samples. 

5. There is a standard error for most statistical computations, the 
mean, standard deviation, coefficient of correlation, regression line, etc., 
and no statistical computation is complete until its standard error is com- 
puted. The standard error of a statistic is the means of measuring its 
reliability and dependability. 

6. The standard error of the difference between two means is the de- 
vice for determining whether the difference between two means is so 
great that the two samples could not reasonably be considered as being 
taken from the same population. This test is frequently called the 
Critical Ratio. 
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REVIEW QUESTIONS 

1. What is a random sample? Explain fully. 

2. Define universe, population. 

3. Explain ^^the inertia of large numbers.” Give examples. 

4. Explain ^Hhe permanence or persistence of small numbers.” Give 
examples. 

5. Explain ^^multiplicity of forces” as used in explaining sampling. 
Give examples. 

6. Explain ^^independence of forces” as used in describing samples. 
Give examples. 
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7. Explain equality of forces’^ as used in describing samples. Give 
examples. 

8. What is ^^bias^^ in sampling? Give illustrations. 

9. What is a stratified samplef Give examples. 

10. What is a purposive samplef Give examples. 

11. What is a stratified purposive samplef Give examples. 

12. What is a parameter? Explain fully. 

13. What is the difference between a parameter and a statistic? 

14. What is the meaning of probability as used in statistics? Give 
examples. 

15. How often, theoretically, could one draw all four aces from a 
shuffled deck of 52 cards at one draw? 

16. What is the normal curve of error? Of what use is it? 

17. What is the relation of the binomial expansion, (-1 + f)”, to the 
theory of probability? 

18. If ten coins are thrown 1,024 times, w^hat is the theoretical prob^ 
ability of the number of times they would fall 2 heads and 8 tails? 

19. What is the meaning of error as used in statistics? Give examples. 

20. What is meant by the standard error of the mean? How is it 
computed? Of what use is it in statistics? 

21. What is the standard error of the standard deviation? How is it 
computed? Of what use is it? 

22. What is the standard error of the coefficient of correlation? How 
is it computed? Of what use is it? 

23. What is the standard error of the difference of two means? How 
is it computed? Of what use is it in statistics? 

24. What is Tf Of what use is it? 

25. What is the relation of probable error to standard error? Explain. 

EXERCISES 

1. (Jx = b, N = 65, X — 40. Write X db as- 

2. cTa: = 3.5, W = 50, X = 20. Compute the size of the sample re- 
quired to reduce the standard error of the mean to 20 ±0,1. 

3. Compute Co- for Exercises 1 and 2. 

4. For one sample, cto, = 4, W = 65, Z = 40. For another sample, 
(jx — 4.5, W = 81, Z = 44. Compute T. 

Ni 

5 . Fc = — 2.71828 at the mean. N = 145, i = 2, ctx = 6. Con- 

2.5066cr 

struct the normal curve of error. 




Part Three 

The Analysis of Time Series 


CHAPTER 15 

TIME SERIES ANALYSIS 


Two of the most important bases for classifying data are 
spatial series and time series. Spatial series are based on geo- 
graphic distribution such as farms, city blocks, villages, cities, 
counties, states, and nations. Time is not an element in such 
data. These data are taken at the same point of time or without 
any regard for time. Examples of such geographic or spatial 
series are wheat yields by farms, counties, or states for a single 
year; petroleum production in many fields for the same week or 
year; egg production of several flocks for the same month; shoe 
production for many factories for the same day, month, or year, 
and the like. In all such series, time is held constant or eliminated. 
Most of the analyses developed in the previous ten chapters of 
this book are designed primarily for the treatment of spatial or 
non-time series of data. The school children data w^ere analyzed 
on this basis. The measurements were for a single point of time. 

The other large group of data is called time series. A time 
series is a sequence of values which correspond to successive points 
or periods of time. Wheat production by years, automobile pro- 
duction by months, the average monthly price of crude oil, weekly 
freight car loadings, daily prices of a stock or bond on the security 
market, the temperature readings by hours, the number of copies 
of a newspaper or hand bill run off a printing press in successive 
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minutes, are all time series. The time unit may vary from cen- 
turies or millennia to minutes or seconds. The most common 
unit for factory production and commodity prices is the month. 
For agricultural production the usual time unit is the year. Stock 
and bond prices are daily quotations. The same variable may 
supply several time series, such as (1) daily petroleum production, 
(2) weekly petroleum production, (3) monthly and (4) annual 
petroleum production. A time series for a shorter time period 
may be changed to a series for a longer time period merely by 
adding as many of the units for the shorter period as there are 
short periods in a longer time unit, as for instance, changing 
hourly data to daily data, or changing monthly data to yearly data. 


COMPLEXITY OF TIME SERIES 

Since a time series is a succession of measurements through the 
passing of time, it may be very complex. Ordinarily a time 
series is not a simple one-unit variable. Such data usually are a 
composite of three, four, or even more variations, all taking place 
simultaneously. The four types of changes usually found in time 
data are: 

1. Seasonal variation 3. Cyclical fluctuations 

2. Secular trend 4. Random or accidental changes 

The figures for automobile production at any point of time may 
include all these movements. The number of automobiles pro- 
duced or sold varies from month to month because of the time of 
year. The time of year when new models are offered or when the 
weather is more favorable for driving shows larger figures. The 
totals for any period are partially the result of the long-time 
growth, or trend, of the industry. Growth of population, im- 
provement of highways, increased exports and other factors tend 
to raise the trend. The substitution of other products tends to 
lower trend. The monthly figures are also influenced by fluctua- 
tions of alternating periods of good and bad times, or prosperity 
and depression. Besides all these there are occasional or acci- 
dental factors, such as wars, droughts, storms, floods, pestilence, 
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fires, earthquakes, etc., which may occur at any time but are 
quite irregular. Some series such as security prices or grain fu- 
tures; are affected by daily or even hourly fluctuations. 

WORKSHEET NO. 46 


TJ.S. Annual Automobile Production for the Years 
1900-1941, Inclusive (In 1,000’s) 


Year 

Annual 

Automobile 

Production 

Annual 

Year Automobile 
Production 

Annual 

Year Automobile 
Production 

Year . 

Annual 

A^utomobile 

Production 

1900 

5 

1911 

168 

1922 

2,274 

1 1932 

1,135 

1901 

5 

1912 

313 

1923 

3,625 

1933 

1,573 

1902 

9 

1913 

462 

1924 

3,186 

1934 

2,178 

1903 

10 

1914 

544 

1925 

3,735 

1935 

3,252 

1904 

23 

1915 

896 

1926 

3,784 

1936 

4,454 

1905 

24 

1916 

1,526 

1927 

2,936 

1937 

4,333 

1906 

29 

1917 

1,746 

1928 

3,815 

1938 

2,001 

1907 

35 

1918 

943 

1929 

4,587 

1939 

2,929 

1908 

58 

1919 

1,658 

1930 

2,785 

1940 

3,755 

1909 

115 

1920 

1,906 

1931 

1,973 

1941 

3,816 

1910 

170 

1921 

1,442 






In Worksheet No. 46 are presented the data of automobile pro- 
duction for forty-two years, 1900-1941, inclusive. These data 
show the development of this great industry from its early and 
small beginning to its full maturity. 


WORKSHEET NO. 47 

Chicago Wholesale Price of Fresh Eggs, 1931-1941, Inclusive 
Cents Per Dozen * 


Year 

Jan. 

Feb. 

Mar. 

Apr. 

May 

June 

July 

Aug. 

Sept 

Oct 

Nov 

Dec. 

Month 

Average 

1931 

21 1 

16.2 

19 2 

17.5 

16.7 

15.9 

17 9 

19.1 

20 0 

24 3 

29.3 

24.8 

20 2 

1932 

17,5 

14 6 

12 2 

12.5 

12 7 

12.5 

13 8 

170 

20 0 

23.7 

29.7 

28 8 

17 9 

1933 

20.6 

12 9 

12.4 

12.7 

13 2 

12.2 

14 0 

13 7 

17.0 

19 5 

22 6 

19.3 

15.8 

1934 

20 3 

17 0 

16.6 

15 6 

15 2 

14 7 

15 3 

19 5 

21.3 

23.5 

26.7 

26.2 

19.3 

1935 

27.5 

27.8 

21 2 

23 0 

24.0 

22 9 

22,9 

24 6 

26.1 

26 8 

29 2 

27.2 

25.3 

1936 

23 2 

27.5 

19.6 

19.2 

20.2 

210 

21.4 

22 6 

24.8 

27.4 

33 5 

29.6 

24 2 

1937 

23.2 

21.7 

22.6 

21 8 

20.1 

19.1 

20 0 

20.1 

22.2 

: 22 1 

26.5 

24.3 

21.9 

1938 

20 9 

16.9 

17 4 

17.8 

19.5 

19 3 

20 3 

21 0 

23 1 

25.3 

27 3 

25.4 

21.3 

1939 

18 1 

16.5 

16 6 

16.4 

15 8 

15.3 

15 4 

15.5 

18.2 

20.1 

23.6 

19.1 

17 6 

1940 

20-8 

21.3 

16.4 

16.4 

16.5 

15 6 

15.8 

16.3 

19.3 

20.3 

23.6 

25.2 

18.9 

1941 

18 4 

16.7 

17.8 

21.6 

22-3 

25.1 

26.1 

27.7 

29 0 

31 0 

36.0 

34.5 

25.5 


* Source: Statistical Section, Yearbook of Agriculture^ 1941. 



Price in 
cents 



Fig. 67. Chicago wholesale price of fresh eggs, 1931-1941 inclusive, 
cents per dozen. (Statistical section, Yearbook of Agriculture, 1941) 

WORKSHEET NO. 48 


Aie Transport in the United States, 1926-1940, Inclusive* 


Year 

Miles Flown 

Passengers 

Carried 

Aircraft 

Exported 

1926 

4,258,771 

5,782 

50 

1927 

5,779,863 

8,661 

63 

1928 

10,400,239 

47,840 

162 

1929 

22,380,020 

159,751 

348 

1930 

31,992,634 

374,935 

321 

1931 

42,755,417 

469,981 

140 

1932 

45,606,354 

474,279 

280 

1933 

48,771,553 

493,141 

406 

1934 

40,955,396 

461,743 

490 

1935 

55,380,753 

746,946 

333 

1936 

63,777,226 

1,020,931 

550 

1937 

66,071,507 

1,102,707 

629 

1938 

69,668,827 

1,343,427 

876 

1939 

82,571,523 

1,876,051 

1,220 

1940 

108,800,136 

2,959,480 

3,532 

1941 


3,969,000 

6,000t 


* Source: Moody's Investor Service, Railroads, 1941, p. a42. 
t Estimated. 
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Miles 
Flown in 
Millions 



Fig. 68. Miles flown in American air transport, 1926-1940, Arithmetic 
Scale. (Moody^s Investor Service, Railroads, 1941, p. 42) 


The three preceding worksheets indicate the form in which 
time series data are usually collected and made ready for analysis. 
Worksheet No. 47 indicates the form required for a complete 
time series analysis including trend, seasonal, and cycle based on 
monthly data. For such a complete analysis, monthly data are 
best. Data for longer time periods, such as quarterly data or 
semi-annual data do not permit a complete detailed measurement 
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Miles 
Flown in 
Millions 



Pig. 69. Miles flown in American air transport, 1926-1940, Logarithmic 
Scale. (Moody’s Investor Service, Railroads, 1941, p. 42) 

of seasonal variation. Data for shorter periods such as weekly 
data or daily figures are subject to too many small chance varia- 
tions to be well suited to trend or cyclical measurements. Monthly 
data, therefore, are most widely used. 

The automobile data of Worksheet No. 46 showm in Fig. 73 
indicate the typical growth curve of an industry from infancy 
to maturity. This growth trend is characteristic not only of auto- 
mobile production, but also of all other industries including radios, 
refrigerators, farm machinery, textiles, metals, and even novel- 
ties. During the earlier years of the new industry’s development, 
growth is slow. This is the period when processes are being per- 
fected, advertising expanded, good will created, and sound financ- 
ing established. Then comes the period of rapid expansion during 
which time the strong firms that have survived the trying initial 
years fill a ready market with their products. After such an ex- 
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pansion period, which may last from a decade to half a century, 
a condition of market saturation is reached after which most 
sales are replacements. Up to 1925 most automobile sales were 
new sales, or sales to new owners. After 1929 most automobile 
sales were to persons who had already owned cars. Some indus- 
tries pass through these three periods of growth in a few years. 
Games, fads, and small articles of personal use have such a short 
life cycle. Yo-yo tops, “baby golf,^^ and costume jewelry are 
examples. On the other hand, it took radios twenty years, auto- 
mobiles forty years and railroads eighty years to attain points of 
saturation. 

Trends may be downward as well as upward. Illustrations of 
such series are the production of farm wagons, the number of 
horses and mules on farms, the production of buggies and other 
horse-drawn carriages, the acreage in wheat in Iowa or Wisconsin 
since 1900, the acreage of cotton since 1933, the production of 
anthracite coal since 1910, the decline of population in decadent 
mining communities and in some farming and manufacturing 
areas, and other series which the student may readily call to mind. 
Such trends may be caused by soil exhaustion, mineral depletion, 
changes in transportation, substitution of new products for old, 
new methods of production, congressional or legislative enact- 
ments, and many other factors. The complete measurement of 
trend for some industries from their early developments through 
their maximum expansion to their ultimate decline frequently 
takes the form of a simple parabola or a slightly modified form of 
such a curve. Worksheet No. 49 indicates such a group of time 
series. 

The decline in the number of horses and mules is caused by the 
mechanization of agriculture and city transportation. The de- 
cline in the number of banks is the result of establishing too many 
small banks prior to 1920 and their subsequent failure or com 
solidation. 

Trends may be measured in terms of absolute numbers or in 
ratios of change or percentages. The former is done on the arith-^ 
metic scale; the latter on the logarithmic scale. 

The value of the logarithmic scale for measuring trends is that 




WORKSHEET NO. 49 


Numbee of Hoeses, Mules, and Commeecial Banks 
IN THE United States Since 1870, by Decades 


Year 

Horses 
la 1,000 

Mules 

In 1,000 

Year 

Number of Banks 

1871 

8,054 

1,305 

1870 

1,937 

1881 

11,187 

1,912 

1880 

3,355 

1891 

16,329 

2,377 

1890 

8,201 

1901 

17,955 

3,190 

1900 

10,382 

1911 

20,418 

4,429 

1910 

23,095 

1921 

19,369 

5,768 

1920 

30,139 

1931 

13,195 

5,273 

1930 

24,079 

1941 

10,364 

4,238 

1940 

15,000 


Number of 
Horses in 
Hundred 
Thousands 



Eig. 70. Horse population of U.S,, 1866-1941, and parabolic 
trend. (Statistical section, Agricultural Yearbook, 1941) 
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1850 1860 1870 1880 1890 1900 1910 1920 1930 1940 

Fig. 71. Number of banks in U.S., 1850-1940. (Statistical Abstract 
of the United States, 1942) 


businessmen are often more interested in the rate at which their 
volume is increasing or decreasing than in the absolute amount. 
In comparing the progress of salesmen, or the growth of branch 
plants or units of a chain store system, percentage changes are 
more significant than total figures because of the relative size of 
the units compared. 

Seasonal 

Month-to-month changes are the easiest variations in a series 
to observe and comprehend and for managerial purposes require 
the most constant attention. All merchants must stock and 
alter the inventory of their goods with the changing seasons. 
Heavy suits, coats and footwear must replace lighter goods in 
the autumn, while the reverse is true in the spring. In a wheat 
country, plows, harrows, and seeders are sold in the fall, and 
binders, combines, and thrashers in the spring and summer. 
The stocking of seasonable fruits and vegetables keeps the gro- 
ceryman and produce house on the constant alert. The sales of 
automobiles, gasoline, and tourist resort services rise in the sum- 
mer. Highway construction as well as most building is larger 
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in the frost-free season. Baseball, football, basketball, wrestling, 
track events all run in a marked seasonal pattern as does all 
school attendance. The canning of fruits and vegetables is highly 
seasonal as well as all other agricultural production, processing, 
and marketing. The demand for bank credit and the accumula- 
tion of surplus reserves are well known seasonal phenomena. 
The manifest expansion of Easter, Thanksgiving, and especially 
Christmas trade, is urdversally known. Even the proverbial 
abundance of ^'June brides’^ may be associated with the poetic 
line, ‘‘In the spring a young man’s fancy lightly (or seriously) 
turns to thoughts of love.” So universally are our daily lives 
affected by seasonal changes that we are all clearly conscious of 
this variation and of how to make adjustments to it. 

The Cycle 

Of all time series changes the cycle is the most erratic and un- 
predictable. Business cycles are not only irregular for a single 
series of data but also vary widely from one series to another 
both as to extent and time of fluctuation. The price of bread, 
taking both the quality and size of the loaf into consideration, 
may vary only a cent or two in the course of five years while the 
price of flour may rise or fall from twenty to fifty percent of its 
average value. The price of wheat during the same period may 
range from $.40 to $2.00 a bushel. As a general principle, the 
nearer a good is to the condition required for final consump- 
tion, the smaller is the cycle, and the nearer it is to the condition 
of a raw material, the wider the fluctuations of its price. The 
prices of nails, screws and knives change much less than do the 
price and production of pig iron and steel. For the same good for 
both production and price, no two successive cycles are alike or 
even similar. In fact, these changes are not real cycles at all, 
but are merely fluctuations. They possess none of the periodic- 
ity or regularity of the movement of the seasons, the phases of 
the moon, or even of the weather. It is impossible to forecast 
either the exact or even approximate time or severity of a busi- 
ness cycle. Certain cycles connected with biological or agricul- 
tural production such as the “corn and hog cycle” and the “cattle 
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cycle/^ or the life cycle of certain pests are more regular. In fact, 
the entire physical universe seems to run in a series of fluctuations 
from day and night on the planets to the precession of the equi- 
noxes, of which the solar radiation, or ^^sun spots cycle is among 
the most uniform, running in periods of approximately eleven 
years and multiples of that time. The more accurate measure- 



Fig. 72. Yolume of industrial production of consumption goods, average 
1923-1925 = 100. (The Cleveland Trust Company Business Bulletin^ 
March 15, 1934) 

ment of business cycles is a highly specialized and advanced field 
of statistical analysis which is clearly beyond an elementary 
study of basic statistics. Fig. 72 will indicate to the student the 
variety of fluctuations in the generalized or averaged business 
cycle. The cycles for separate fields of business and the individual 
phases of business activities within each of these fields fluctuate 
more unpredictably and erratically than the general cycle shown 
in Fig. 72. 

Irregular Changes 

Excellent examples of the occasional factors which affect 
time series in agriculture and industry are the San Francisco 
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Earthquake of 1906, the great Ohio and Mississippi floods of 
1927 and 1937, the tropical hurricanes of 1900 which destroyed 
Galveston and of 1938 which caused so much damage in New 
England, and the severe droughts of 1930, 1934, and 1936, which 
so greatly reduced the crop production of the United States. 
The First and Second World Wars are among the most powerful 
disrupting factors in the normal flow of business and life in gen- 
eral that mankind has ever experienced. Such accidental and 
irregular “acts of God” and political upheavals cannot be fore- 
cast with any large degree of accuracy or prevented or controlled. 
In the analysis of time series we find it very difficult to separate 
such sporadic developments from the ordinary fluctuation of 
business cycles. 

SOURCES OF TIME SERIES DATA 

Every movement and activity in the world or within the 
knowledge of man from the range of the microscope to that of the 
telescope that continues through time may be considered a time 
series. In ordinary statistical analysis time series may be con- 
sidered as limited primarily to business, social, and political ac- 
tivities. Our illustrations will be confined to business or economic 
data. The sources of such information are numerous and varied. 
The more prominent and useful ones are listed and described 
briefly in the following pages. 

The Daily Newspaper 

The daily paper is a large and useful source of time series data. 
The larger city dailies contain reports on the prices and sales 
of bonds and stocks of many corporations and governmental 
units, interest rates, bank debits, commodity prices of farm 
products, metals, minerals, hides, and textiles, besides exports, 
imports, ship loadings, car loadings, air transport, and commodity 
production. Every successful businessman reads carefully two 
sources of business data, a daily newspaper, • and his own trade 
journal. Both are full of time series, many of which are closely 
related to his economic interests. 
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Government Statistics 

The largest single producer of statistical information is the 
United States Government. Its services cover every field of 
social and business activity. Among its publications are: 

The Federal Reserve Bulletin, issued monthly by the Board of 
Governors of the Federal Reserve Board and sold at a price of 
J2,00 a year, covers all banking and financial subjects, besides a 
large section of index numbers on industrial production, employ- 
ment, and general business conditions. Most banks are sub- 
scribers, besides public libraries, and many business offices. It is 
an important source of data for all students of business. 

The Survey of Current Business is issued monthly with a four- 
page weekly supplement by the United States Bureau of Foreign 
and Domestic Commerce at a cost of $1.50 a year. It covers the 
entire field of production and trade with monthly and annual 
data. It is the largest and most complete source of information in 
this field. 

Crops and Markets, issued monthly by the United States De- 
partment of Agriculture at a price of $1.00 a year, covers the en- 
tire field of current agricultural production and prices. 

The Monthly Labor Review is issued monthly by the United 
States Department of Labor and covers wage rates, employment, 
and all other labor statistics besides important index numbers. 

Besides these and many other minor monthly and occasional 
publications which may be had for a nominal price to cover costs 
of publication, there are the large annual reports of the several 
departments of the federal government. These include The 
Mineral Yearbook, Agricultural Yearbook, Statistical Abstract of the 
United States, The Reports of the Treasury Department, The 
Interior Department, and The Reports on Domestic and Foreign 
Commerce, The Market Data Handbook, and The Biennial 
Census of Manufactures. 

The Census Reports 

Besides the great Decennial Census which covers every phase 
of population, agriculture, family, trade, commerce, and wealth 
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from New York and Chicago down to the smallest village, town- 
ship, and farm in the nation, there is the Five-Year Census of 
Agriculture, The Biennial Census of Manufactures, The Cen- 
sus of Vital Statistics, and the Ten-Year Census of Religious 
Bodies. Besides these there are thousands of special bulletins 
and statistical analyses in all fields of economic activities. An 
important part of any business education is an accurate work- 
ing knowledge of the sources of statistical information supplied 
by our federal government. It is truly a servant of the farmer 
and businessman. 

Private Statistical Services 

There are dozens of private statistical services, some of them 
of large magnitude, and only a few of which can be listed here. 
The student may soon become acquainted with others in his 
special field of interest by perusing the shelves of the public 
library or the trade journals of the business office. Listed al- 
phabetically below are a few of the larger ones: 

Babson^s Reports, which include several types of services and 
publications for the investor and industrialist, among which is 
the Babsonchart, covering prices, production, the securities mar- 
kets, and business forecasting. 

Business Week (consolidated with the Annalist) is published by 
McGraw-Hill Publishing Company at a cost of $5.00 a year. It 
includes much data besides many analyses and forecasts of cur- 
rent business. 

The Commercial and Financial Chronicle, published by William 
B. Dana Company at a cost of $15.00 a year, covers all phases of 
current business and corporate reports and statistics. It is one of 
the best of the older business publications. 

Dun and Bradstreet, Inc., publish Dun's Review and Dun's 
Statistical Review at a cost of $5.00 a year. The Dun and Brad- 
street services listing the credit standing or rating of all com- 
panies and firms of any consequence in the United States are 
of invaluable service to businessmen. Their index numbers and 
data on commodity prices are widely used. 

The Moody's Investor Service publishes a number of services on 
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security ratings and prices and corporation profit and loss state- 
ments, balance sheets, and statistical summaries. Their services 
are very extensive covering railroads, public utilities, banks, and 
industrial companies of all kinds. The Moody pubhcations con- 
tain a vast amount of time series data. 

The Standard and PooYs Corporation publishes a large number 
of services including information on security prices of all types 
and a large number of price and production index numbers be- 
sides large tabulations of business data. Extensive and frequent 
analyses of the major fields of business are a large and valuable 
part of their publications. Almost all important time series in 
the fields of finance, securities, banking, prices, and production 
are found in their services. 

The United Business Service covers in summary from all the 
principal forecasting services of the United States. 

Besides these services listed above, many of the large banks in- 
cluding the Chase National Bank and The National City Bank, 
both of New York, and the CleVeland Trust Company of Cleveland, 
Ohio, and all twelve of the separate Federal Reserve Banks pub- 
lish monthly analyses of business conditions and supply a large 
volume of business data. 

All of the hundreds of trade association publications and trade 
journals publish large volumes of data in the special fields covered 
by their respective interests. 

In preparing many of these reports and services, statisticians 
employ the theory and methods outlined in the next four chapters 
of this text. Some of the data appear in their crude and unana- 
lyzed form. Other portions are corrected for seasonal variation. 
Appropriate trend lines are fitted to some of them. Many of the 
indices appear as cycle deviations. Hundreds of complex index 
numbers appear among their offerings. Although many college 
students may never be actively engaged in creating or analyzing 
business statistics, all of them will be consumers of statistics all 
their lives. They will have to read and interpret business data 
and social data of a wide variety. Every newspaper, magazine, 
trade journal, and radio commercial today is full of data on busi- 
ness, social, and political conditions requiring some knowledge of 
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statistics in order for the reader or hearer to understand them. We 
live in a world of figures, numbers, balance sheets, profit-and-loss 
statements — a world of statistics and statistical analysis — and 
a study of time series analysis is in many ways as useful to the 
consumer of statistical information as it is to the analytical scien- 
tist. In the modern world we literally live by statistics. 

SUMMARY 

1. A spatial series is a group of measurements for a number of geo- 
graphic areas for the same point in time or for the same period of time. 

2. A time series is a sequence of values which correspond to successive 
points or periods of time. A time series requires time sequence. 

3. A time series may be set up for time intervals of any prescribed 
length, such as for seconds, minutes, hours, days, weeks, months, years, 
decades or centuries, or even longer periods. Monthly and yearly meas- 
ures are most common. 

4. A time series is usually quite complex. Frequently several types of 
variation are occurring in the series at the same time. 

5. The four most frequently measured and most important time varia- 
tions are (1) seasonal variation, (2) secular trend, (3) cyclical fluctua- 
tions, and (4) random or accidental change. 

6. Seasonal variation is change in the time series due to the changing 
seasons of the year. In most portions of the earth these changes are 
quite evident and far-reaching in their effect on man^s activities. 

7. Secular trend is variation in a time series due to long time growth. 
It measures the changes due to evolution, expansion, and decay. 

8. Cyclical fluctuations are those more or less regular periods of ex- 
pansion and contraction which succeed each other at intervals longer 
than seasonal change but shorter than secular change. 

9. Time series analyses may be employed in any field of activity ex- 
tending through time, but they have been more highly developed in the 
field of finance and business activity than elsewhere. 
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REVIEW QUESTIONS 

1. Define a time *series. 

2. Explain in detail and give examples of the differences between a 
spatial series and a time series. 

3. Why are time series difficult to analyze? 

4. What four variations are usually found in a time series? 

5. Why is a seasonal index essential to the management of a business? 

6. What is the usual shape of the growth trend of an industry? 

7. Is it possible that an individual basic unit or corporation might 
also have a similar growth trend? Explain. 

8. What is the difference between an arithmetic scale and a logarithmic 
scale? Illustrate. 

9. What are the chief characteristics of a business cycle? 

10. What are erratic or accidental movements in business and how 
are they distinguished from cyclical movements? 

11. Name and explain the content of four different United States 
Censuses. 

12. Name and describe the nature of four private statistical services. 

13. Name ten series of data found in the Federal Reserve Bulletin. 

14. Name ten series of data found in the Survey of Current Business. 

15. Of what advantage is it to the average businessman to understand 
the analysis of time series? Explain. 




CHAPTER 16 

SECULAR TREND 


As explained in Chapter 11, the average long-time movement 
of a time series is known as secular trend. As distinguished from 
seasonal variation, which is a month-to-month change within the 
year, secular trend is the average long-time increase or decrease 
of a series caused by growth or decay. As distinguished from 
cycle, which is the fluctuation of a series over a few years caused 
by prosperity or depression, secular trend is the gradual change 
over many years caused by the expansion or contraction of the 
activity as a whole. It is evident to all of us that all human ac- 
tivity, including business and economic institutions and social 
and political movements, are subject to slow long-time processes 
of growth and decay. The majestic ruins of Egypt and Babylon 
are mute reminders of the growth, power, and decay of cultures 
so old that they are obscured by the mists of antiquity. Edward 
Gibbon^s Decline and Fall of Rome is an eloquent reminder of the 
melancholy ruin of the village on the Tiber which grew in five 
centuries to be the Mistress of the World. The rise, wealth, and 
decline of the Hanseatic League, of Tyre and of Venice are ex- 
amples of the same force in the business world. Indeed, this 
world movement of growth and decay applies not only to empires, 
cultures, and industries, but also to business firms, houses, and 
products. It is the law of life from empires and corporations, 
from Rome and the East India Company, to grocery stores, 
tailor shops, and peanut stands. ’ No statistical analysis of a 
business can be complete without a measurement of its trend. 

At this point it would be well for the student to review Chap- 
ter 11 which gives a full general treatment of regression. Time 
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series trends are only a special case under general regression. 
Because time units are relatively uniform and are continuall}^ 
repeated, it is possible to simplify and abridge time series trend 
formulas to save from one-half to three-fourths of the time re- 
quired in the general regression computations. This is why it is 
preferable in statistical analysis to treat time series trend in a 
chapter separate from the basic principles of regression. A care- 
ful review of those principles, however, will be of great aid to the 
student at this point. 

GRAPHIC ANALYSIS 

The first step in every case of computing a secular trend is to 
place the data on coordinate graph paper, either of the arith- 

Automobiles 
In Thousands 
5,000 


4,000 


3,000 


2,000 


1,000 


0 

1900 1910 1920 1930 1940 

Fig. 73. Automobile production in U.S., 1900“1941, and 
long-time trend, showing the development of the industry, 
Arithmetic Scale. (Standard and Poor’s) 
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Fig. 74. Automobile production in U.S., 1900-1941, and long-time 
trend, Logarithmic Scale. (Standard and Poor’s) 

metic or semi-logarithmic type. This graph will enable the 
statistician to judge much more accurately and at once the cor- 
rect type of trend line and formula to use. Graph the data first. 
If it is desired to measure the absolute growth of the business, 
the arithmetic scale is necessary as is shown in Fig. 73. If the 
rate of growth, or percentage change is desired, the semi-logarith- 
mic scale must be used as in Fig. 74. The logarithmic graph 
gives the better trend. From 1900 to 1917 the growth of auto- 
mobile production was at a constant rate of about 30% a year. 
After 1917 the rate of expansion declined to almost zero by 1941, 
when the industry reached its full growth with a saturated market. 
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In Fig. 70 the horse population of the United States is shown 
with a simple parabola fitted to it. The parabola fits fairly well 
the trend of an industry or business which has reached its matu- 
rity and has begun to decline. The number of banks in the United 
States from 1850 to 1940 is not accurately measured by the 
parabola. The period from 1890 to 1940 would fit a parabola 
quite well, but the earlier period is better represented by the nor- 
mal growth curve. 

These examples are sufficient to indicate the importance, if 
not the necessity, of plotting all time series data on one or both 
types of coordinate paper before attempting any mathematical 
trend computations. Freehand regression curves which were 
explained in detail in Chapter 11 may be used to good advantage 
as time series trends in many cases, but considerable experience 
and a high degree of skill are necessary for good results. In most 
cases the student should depend on mathematical lines for time 
series trends. 


MOVING AVERAGE TRENDS 

One of the simplest types of secular trend is the moving average 
of the data for the time periods. The method of its computation 
is illustrated in Worksheet No. 50. The number of time periods 
averaged may vary from two to any larger number, but the length 
of the time covered by a moving average should be uniform 
throughout the series. For instance, if it is decided to use a five- 
period moving average, the data for the first five time periods are 
totaled and divided by five and the resulting average is written 
opposite the middle year of the periods averaged. In Worksheet 
No, 50, the five items of data, 6, 8, 10, 9, and 7 are totaled. The 
sum is 40 which divided by 5 equals 8, which is written opposite 
the third time period. The datum, 6, for the first time period is 
dropped and the datum, 7, for the sixth time period is added. 
The total of the five items 8, 10, 9, 7, and 7 is 41, the average of 
which is 8.2 which is written opposite period number 4 which is 
the middle of these five periods. After each average the top 
figure in the group is dropped and the figure next below the last 
datum of the previous average is added to make the new total. 
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By this method of dropping the top figure and adding the next 
one below the previous group the total moves down successively 
through the entire number of time periods. There can never be 
as many averages as there are original time periods, because the 
average must be in the center of the group averaged. With a 
three-period moving average there can be no average for the first 
and last time periods. With a seven-period average there can be 
no average for the first three and the last three time periods. 

The number of time periods averaged, as far as possible, should 
be identical with the length of the cyclical movements in the data. 
If this can be done, a smooth trend can be secured. In Work- 

units of 
Data 
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Time Periods 

Fig. 75. Showing uniform cycles and five- and seven-period moving 

averages 
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sheet No. 50 the cycles are five time periods long and the five- 
period average gives a smooth line as shown in Fig. 75. 

WORKSHEET NO. 50 

Typical Moving Aveeages with Unifoem Cycles 


Time 

Periods 

Original 

Data 

2-Period 

3-Period 

5-Period 

7-Period 

10-Period 

Moving 

Moving 

Moving 

Moving 

Moving 

Average 

Average 

Average 

Average 

Average 

1 

6 

7 





2 

8 

9 

8 




3 

10 

9.5 

9 

8 



4 

9 

8 

00 

8.2 

8 


5 

7 

7| 

8.4 

81 




7 



8.5 

6 

7 

8 

7f 

8.6 

9 

8.7 

7 

9 

10 

9 

8.8 

8f 

8.9 

8 

11 

10.5 

10 

9.0 

8^ 

9.1 

9 

10 

9 

9| 

9.2 

9 

9.3 

10 

8 

8 

8f 

9.4 

9f 

9.5 

11 

8 

9 

8| 

9.6 

10 

9.7 

12 

10 

11 

10 

9.8 

9f 

9.9 

13 

12 

11.5 

11 

10.0 

n 

10.1 

14 

11 

10 

lOf 

10.2 

10 

10.3 

15 

9 

9 

9| 

10.4 

lOf 

10.5 

16 

9 

10 

9f 

10.6 

11 

10.7 

17 

18 

11 

12 

11 

10.8 

' 101- 

10.9 

13 

12.5 

12 

11.0 

lOf 

11.1 


19 

12 

11 

11! 

11.2 

11 

11.3 

20 

1 10 

10 

lOf 

11.4 

llf 

11.5 

21 

10 

11 

13 

lOf 

11.6 

12 


22 

12 

12 

11.8 

Ilf 


23 

14 

13 

12.0 





13.5 





24 

13 

12 

12f 




25 

11 
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Fig. 76. Automobile production in U.S., 1900-1941, with broken trends 


From the relationships revealed in Worksheet No. 50 it is 
evident that a moving average has two important limitations as 
a measure of time series trend. 

1. No moving average can be as long as the period of time and 
series of data on which it is based. Even a 2-year moving average 
will always lack one year of coming up to date. A 5-year moving 
average will be two and one-half years short of the present time. 
This inherent defect of a moving average greatly limits its use as a 
trend measure. One of the most important services of a time series 
trend is its use as a means for forecasting future growth or change. 
To serve this purpose well, the trend should come up to the im- 
mediate present. 

2. The second limitation of a moving average is that it will not 
give a smooth trend unless the length of the period averaged is 
identical with or a multiple of the business cycle. In Worksheet 
No. 50 the cycles are uniformly five time periods long. The stu- 
dent will note that only the 5-year and the 10-year moving aver- 
ages give smooth trends. This is because the 5-year trend is 
exactly the length of the cycle and the 10-year average is an even 
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WORKSHEET NO. 51 


Moving Average Trends for Automobile Production in U.S., 

1900-1941 


Years 

Annual Automobile 
Production (000) 

3-Year Moving 
Average 

7-Year Moving 
Average 

10-Year Moving 
Average 

1900 

1901 

1902 

1903 

5 

5 

9 

10 

6.3 

80 

14 0 

15 0 


1904 

23 

19.0 

19.3 


1905 

24 

25.3 

26.4 

31.3 

1906 

29 

29.3 

42.0 

47.8 

1907 

35 

40 7 

64.9 

64.1 

1908 

58 

69 3 

85.6 

94.5 

1909 

115 

114.3 

111.0 

139 7 

1910 

170 

1510 

165.1 

1918 

1911 

168 

217 0 

277 6 

279 0 

1912 

313 

314 3 

408.9 

428 7 

1913 

462 

443 0 

591.0 

599 8 

1914 

544 

634.0 

770.2 

688 3 

1916 

896 

988 7 

946.2 

842.6 

1916 

1,526 

1,389.3 

1,115.5 

1,016.2 

1917 

1,746 

1,405.0 

1,290.9 

1,143 6 

1918 

943 

1,449.0 

1,468.0 

1,339.7 

1919 

1,658 

1,502.0 

1.676.4 

1,656.0 

1920 

1,906 

1,668 3 

1,910.5 

1,920.2 

1921 

1,442 

1,874 0 

2,212.0 

2,204.1 

1922 

2,274 

2,447 0 

2,514.7 

2,429.9 

1923 

3,625 

3,028 3 

2,798.0 

2,548.9 

1924 

3,186 

3,515.3 

3,061.4 

2,836.1 

1925 

3,735 

3,568.3 

3,333.5 

3,129 0 

1926 

3,784 

3,485.0 

3,516.7 

3,216 9 

1927 

2,936 

3,511.7 

3,529.1 

3,270,0 

1928 

3,815 

3,779 3 

3,307.5 

3,156 1 

1929 

4,587 

3,729 0 

3,020.6 

2,950 9 

1930 

2,785 

3,115.0 

2,759 0 

2,850 1 

1931 

1,973 

1,964 3 

2,590 8 

2,801.8 

1932 

1,135 

1,560 3 

2,521 1 

2,868 8 

1933 

1,573 

1,653 3 

2,561 6 

3,008 5 

1934 

2,178 

2,334.3 

2,630 0 

2,827.1 

1935 

3,252 

3,291 3 

2,790 8 

2,661.3 

1936 

4,454 

4.013 0 

2,981 5 

2,758 3 

1937 

4,333 

3 596 0 

3,245 3 

2,942.6 

1938 

1939 

1940 

1941 

2,001 

2,929 

3,755 

3,816 

3,087 7 

2,895 0 

3,500 0 
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Fig. 77. Automobile production in U.S., 1900-1941, with 
ten-year and three-year moving averages 

multiple of it. The 2-year, 3-year, and 7-year trends are so irregu- 
lar that they are not usable. This limitation of a moving average 
is so great that it largely destroys its use as a trend measure. 
Business cycles are rarely of equal length. They vary from two 
to ten years in duration. The moving average that would fit the 
short cycles would make a poor trend for the long ones. 

It is evident from Worksheet No. 51 and Fig. 77 that a moving 
average cannot be made to serve as an adequate trend for auto- 
mobile production. The reason is that the cycles are unequal in 
length and irregular in depth. While a three- or four-year moving 
average might fit the data fairly well up to 1922, after that date no 
moving average would fit. Even a 10-year moving average is in- 
adequate. The 10-year average is too high for the first 20 years. 
This is always true with data that are increasing at an increasing 
rate. For data with more regular cycles and a more even trend 
the moving average could serve quite well except for the fact 
that it would not extend up to the present. 
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LEAST SQUARES METHOD 

The most satisfactory device for computing time series trends 
for short periods of from five to twelve years is the straight-line 
least squares method. For short periods of from one to three 
business cycles^ most business trends are relatively straight. 
Even for longer periods a satisfactory result is often obtained by 
fitting a series of short straight-line trends to the data as is shown 
for automobile production in Fig. 76. Such trends are much 
easier to fit than more complex curves and often give as good or 
nearly as good results. 

But before they are fitted the student should be convinced from 
a careful study of the development of the series through the en- 
tire period that there is a definite break in the long time devel- 
opment of the series. This point is relatively evident in the case 
of automobile production from 1920 to 1924, New improvements 
in cars and new highway construction after the First World War 
caused the automobile industry to reach saturation production 
quickly. It was an expanding industry up to 1922. After 1924 
it was a mature industry. On the basis of any such definite break 
in trend, it is permissible to use two or more short trends instead 
of one long and more complex trend. The student must always be 
careful not to confuse definite breaks in trend because of some 
clearly indicated industrial long-time change with simply a large 
or violent cycle. If the change is purely cyclical there should be 
no break in the trend. Some years usually must elapse before this 
fact is clearly evident. For the long-time trend for a business 
which has reached its peak and is declining, a simple parabola is 
often a good fit, as is shown for the number of horses in the 
United States. The number of banks in the United States since 
1890 is accurately measured by the same method. 

The factor which makes the least squares method so easy to 
apply to time series is that it may be abbreviated by a short cut. 
If the worksheet is set up so that the time periods (X) have their 
point of origin (0) at their mid-point, the sum of the X’s (XX) 
becomes zero. This eliminates from the normal equations all 
terms including 2Z, the and the XX^. Na + bXX = XY 
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becomes iVa = 2F, and a SX + feZZ' = SZF becomes 62Z2 = 
2ZF. These two formulas reduce to 


a = and o = 
N 


2ZF 

2Z2 


for the straight line. 62Z and a2Z become zero because 2Z = 0. 
This greatly reduces the work. 

For the parabola the normal equations 

Na+ 52Z + c2Z2 = 2F 
a2Z + 62Z2 + c2Z3 = 2ZF 
a2Z2 -{- bXX^ + c2Z4 == 

become 

2ZF Za + c2Z2 = 2F 

2Z2 a2Z2 + c2Z" = 2Z2F 

which eliminates approximately one-half the work. 

Similar short cuts apply to the cubic parabola and other longer 
equations. The student must remember that these short cuts 
apply only to time series. They apply only because time periods 
are equal and can be set up with the point of origin at the mid- 
point so that all odd powers of X cancel out. For all other data 
the basic formulas of Chapter 11 must be used. 


WORKSHEET NO. 52 


Short Cut Least Squares Trend for Automobile Production 
IN U.S., 1935-1941 


Years 

Z 

F 

ZF 

Z2 


1935 

- 3 

3,252.2 

- 9,756.6 

9 


1936 

- 2 

3,664.5 

- 7,329.0 

4 


1937 

- 1 

3,915.9 

~ 3,915.9 

1 


1938 

0 

2,001.0 




1939 

+ 1 

2,866.8 

+ 2,866.8 

1 


1940 

+ 2 

3,693.1 

+ 7,386.2 

4 


1941 

+ 3 

3,754.3 

-f 11,262.9 

9 



0 

23,147,8 

+ 21,515,9 

28 



514.4 
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Formula No. 60 
SF 



7 

Formula No. 

, SZF 

514.4 


28 

■a + hX 


■■ 3,306.8 + 18.37X 

3,306.8 

H r\ 1 

18.37X 


= 18.37 


Y == 275.63 + .13X monthly trend 


CHANGING ANNUAL TO MONTHLY TRENDS 

When a least squares monthly trend line is required^ the labor 
is greatly reduced if the trend is first computed for years and later 
changed to monthly trend values by dividing the annual a by 12 
and the annual b by 144 when the original data used for the annual 
trend are annual totals. In Worksheet No. 52 above, the F 
values are annual totals of passenger car production. The annual 
trend is F = 3,306.8 + 18.37X. By dividing the annual a, 
3,306,8, by 12, the monthly a, 275.63, is obtained. If the trend 
had been computed originally from monthly data, the monthly a 
value would have been the same, but it would have required 
more than 12 times as much work to have computed it. Since 
there are 12 months in a year, dividing the annual a by 12 re- 
duced it to a monthly a. 

The reason for dividing the annual h by 144 to reduce it to a 
monthly b is not so evident. The reason is that h is measured on 
both the F and X axes at the same time. It is a two-dimensional 
value, a has only one dimension. It is located always where 
X = 0 and is measured only on the F-axis. 6, however, measures 
the change on the F-axis associated with a given change on the 
X-axis, h is two-dimensional. When one adds 12 monthly quan- 
tities to obtain an annual quantity, he has without realizing it 
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also added 12 monthly time periods of approximately 30 days 
each to make a 365-day year of 12 

,, . , 11 . Changing Annual to Monthly 

months. Changing monthly to an- 
nual data requires an equal change 
on each axis. Twelve monthly 
quantities of data are ccnsolidated 
on Y to make an annual total and 
12 monthly time periods are con- 
solidated on X to make a year. 

This 12 X 12 = 144, as illustrated in 

Fig. 78. Method of changing 

In the case of the automobile annual to montWy line of trend 
data the annual h, 18.37, is di- 
vided by 144 to give approximately .13 for the monthly h. 

SHIFTING MONTHLY TREND TO MID-MONTH 

As has been stressed many times in this and in earlier chapters, 
all measurements for time periods must be centered in the middle 
of the time period. The reason of this requirement is that if 
there is any trend in the series at all the values at the beginning 
and at the end of the month will not be identical. For instance, 
if the trend is up for later time periods, the trend value at the be- 
ginning of each month will be smaller than that of the end of the 
month. The first value will be too small correctly to represent 
the month. The last value will be too large. Only the average or 
middle value will be the correct measure of the month. This 
fact coupled with the fact that there is an even number (12) of 
months in a year makes the yearly trend centered at the middle 
of the year, between June and July, or July 1, always fall at the 
end of a month. When, therefore, we change the trend from 
years to months we must also change the monthly trend from the 
first to the middle of the month. In Worksheet No. 52 the annual 
trend is centered at July 1, 1938. When this annual trend is 
changed to months it must be shifted from July 1 to July 16. 
This is done by adding one-half of the monthly h to the monthly 
a as follows: 



Y = 275.63 4- .13Z 
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At July 1, 1938, where X = 0, F = 275.63. If we substitute 
J for X, then F = 275.63 + .13J, or F = 275.63 + .06 - 275.69 
which is the monthly trend value for July 16, 1938. Having once 
centered our monthly trend values at the middle of the month, as 
of July 16, we compute all other monthly trend values by adding 
or subtracting the full monthly b value to that of the previous or 
the following month. The monthly trend value of automobile 
production for August is F == 275.69 + .13X with X = 1, or 
F = 275.69 + .13 X 1 = 275.82. For September it is: 

F = 275.69 +.13X2 or 275.69 + .26 = 275.95, 

and so on for all later months. For the earlier months the b 
value is subtracted, because the trend is declining for the earlier 
periods. This relationship is shown in Fig. 79. 



Fig. 79. Method of shifting monthly 
trend of automobile production to 
mid-month 


SHORT CUT TREND FOR EVEN NUMBER OF YEARS 

Worksheet No. 52 contained an odd number of years (7) and, 
therefore, centered the problem on July 1 of the middle year. 
Sometimes it is impossible or undesirable to use an odd number of 
years. With a slight modification the computations may still 
be centered at the middle of the period and the short method used. 
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WORKSHEET NO. 53 

Short Cht Least Squares Trend for Automobile 
Production in U.S., 1936-1941 


Years 

X 

Y 

ZF 


1936 

- 2.5 

3,664.5 

- 9,161.25 

6.25 

1937 

- 1.5 

3,915.9 

- 5,873.85 

2.25 

1938 

-- .5 
n 

2,001.0 

- 1,000.50 

.25 

1939 

u 

+ .5 

2,866.8 

+ 1,433.40 

.25 

1940 

+ 1.5 

3,693.1 

+ 5,539.65 

2.25 

1941 

+ 2.5 

3,574.3 

+ 8,935.75 

6.25 


0 

19,715.6 

- 126.80 

17.50 


a = = 3,285.93 7 = o + 

0 

1 OA Q 

b = ■ = 7.24 7 = 3,285.93 - 7.24Z 

This computation may be made in a slightly different form as 
follows: 

WORKSHEET NO. 54 


Short Cut Least Squares Trend for Automobile 
Production in U.S., 1936-1941 


Years 

Z 

Y 

ZF 

Z2 

1936 

- 5 

3,664.5 

- 18,322.5 

25 

1937 

- 3 

3,915.9 

- 11,747.7 

9 

1938 

- 1 

0 

2,001.0 

- 2,001.0 

1 

1939 

+ 1 

2,866.8 

+ 2,866.8 

1 

1940 

+ 3 

3,693.1 

+ 11,079.3 

9 

1941 

+ 5 

3,574.3 

+ 17,871.5 

25 



19,715.6 

- 253.6 

70 


a = 


Formula No. 62 
27 19,715.6 


N 


6 


3,285.93 
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Formula No. 63 


XXY 

^ 2 
0 - ^2 - 

4 


- 253.6 
2 
70 
4 


126.8 

17.5 


= - 7.24 


This method excludes fractions from the worksheet but intro- 
duces compensating computations in the formula for h. By 
taking the X’s in whole odd numbers their size is doubled and 
X^, therefore, quadrupled. 

Frequently in a time series there is a distinct break as occurred 
in the automobile industry in 1922 after the First World War. 
The trend of such a series often may be measured better with two 
or more straight-line regressions than a parabola or exponential 
curve or other continuous curve. This is illustrated in Fig. 76. 


PARABOLA AS TIME SERIES TREND 

If a business has reached its peak and begun to decline, its 
trend may well be measured by a simple parabola as is shown in 
Worksheet No. 55 for the horse population of the United States, 
1871-1941. This type of trend is illustrated by Fig. 70 and 
Fig. 71 for the number of banks in the United States, 1850-1940. 

WORKSHEET NO. 55 


Parabola Fitted to Number of Horses in the United States 
BY Decades 1871-1941, (in 100,000) 


Year 

X 

Y 

XY 

X2 

X3 

X4 

X^F 

1871 

-* 3.5 

81 

- 283.5 

12.25 

- 42.875 

150.0625 

992.25 

1881 

- 2.5 

112 

- 280.0 

6.25 

- 15.625 

39.0625 

700.00 

1891 

- 1.5 

163 

- 244.5 

2.25 

- 3.375 

5.0625 

366.75 

1901 

- .5 

180 

- 90.0 

.25 

- .125 

.0625 

45.00 

1911 

+ .5 

204 

+ 102.0 

.25 

+ .125 

.0625 

51.00 

1921 

+ 1.5 

194 

+ 291.0 

2.25 

+ 3.375 

5.0625 

436.50 

1931 

+ 2.5 

132 

+ 330.0 

6.25 

+ 15.625 

39.0625 

825.00 

1941 

+ 3.5 

104 

+ 364.0 

12.25 

+ 42.875 

150.0625 

1,274.00 


0 • 

1,170 

+ 189.0 

42.00 

0 

388.6000 

4,690.50 
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Formula No. 64 


SZ2 


189.0 

42.0 


= 4.5 


Formula No. 65 
Na + cSX2 = SF 
SZ^a + cSZ^ = SZ^F 


8a 

+ 

42c = 

1,170 

8a 

+ 

42c = 1,170 

42a 

+ 

388.5c = 

4,690.5 

8a 

+ 

42 (- 8.643) = 1 

la 

+ 

5.25c = 

146.25 

8a 


363.006 = 1,170 

la 


9.25c = 

111.679 

8a 


1,170 + 363.006 


- 

4.C = 

34.571 

8a 

= 

1,533.006 



c = 

- 8.643 

a 

= 

1,533.006 

Q 







O 

191.625 


F = a + 6Z + cZ2 
F = 191.625 + 4.5Z - 8.643Z2 

WORKSHEET NO. 56 

Computation of Standard Error op Estimate for Simple 
Parabola for Even Number of Years 


X 

X^ 

Y 

FI 

Z 


- 3.5 

12.25 

81 

70.0 

+ 11.0 

121.00 

- 2.5 

6.25 

112 

125.7 

- 13.7 

187.69 

- 1.5 

2.25 

163 

165.4 

- 2.4 

5.76 

- .5 

.25 

180 

186.7 

- 6.7 

44.89 

+ .5 

.25 

204 

191.2 

-1- 12.8 

163.84 

+ 1.5 

2.25 

194 

178.9 

+ 15.1 

228.01 

+ 2.5 

6.25 

132 

148.2 

- 16.2 

262.44 

+ 3.5 

12.25 , 

104 

101.5 

+ 2.5 

6.25 

1,019.88 


Sy 




,019.88 


= V127.48 = 11.3 


8 
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STANDARD ERROR OF ESTIMATE 

The values in Worksheet No. 56 are the points on the F- 
axis for the several years through which the parabola passes. They 
are the estimated values for F. The letter Z is used to indicate 
the deviation between the computed values of F, or F^, and the 
actual values of F. For instance, in 1871 the value of the parabola, 
7.0, (7,000,000 horses) feU 1,100,000 horses or 11.0 (in 100,000) 
below the actual value. In 1881 the parabola deviated from the 
actual number of horses by 13.7, or 1,370,000 horses. These 
deviations indicate how well or poorly the parabola fits the data. 

Sy, or the standard error of estimate, is a measure which indi- 
cates the root-mean-square value of these deviations between the 
computed trend and the original data. It is based on the same 
general method as the standard deviation and measures the 
scatter of the data about the trend lines as the standard devia- 
tion measures the scatter of the data around their mean. The 
smaller the standard error of estimate is, the more nearly does the 
trend line fit the data. 

Limitation of Standard Error of Estimate as Applied 
to Time Series 

The standard error of estimate as developed in Chapter 11 in 
relation to regression is based on (1) random samples from a 
large population, (2) populations not subject to internal change 
during the time of sampling. Both of these basic assumptions 
for a dependable standard error of estimate are often violated in 
computing standard errors of estimate for time series. 

The successive items in a time series are not completely inde- 
pendent and do not represent a pure random sample. The value 
of successive items often depends in large measure on earlier items, 
such as growth of population of a city, or sales of a product due to 
increasing good will. The sample is not truly random. There is 
usually some internal change going on in the population over a pe- 
riod of time which tends to destroy the homogeneity of the sample.^ 

^ See M. Ezekiel, Methods of Correlation AnalysiSj Chapter 19 . John 
Wiley & Sons, New York, 1941 . 




REDUCING PARABOLA TRENDS 


387 


REDUCING PARABOLA TRENDS TO SHORTER 
TIME PERIODS 

The parabola for the horse population of the United States was 
computed for data by decades, because of the length of the period 
covered and the large amount of work required to compute such 
a trend by years for a period of seventy years. The trend by 
decades can be changed into an annual trend by dividing the 
total change in for a decade by 10 and adding this amount to 
the for the first year, to obtain the for the second year and 
so on for each successive year. 

WORKSHEET NO. 57 


Year 

yi 

Difference 

10 

1876 

70.0 

1886 

125.7 

55.7 

1876 

70.0 

10 

1877 

75.57 


1878 

81.14 


1879 

86.71 


1880 

92.28 


1881 

97.85 


1882 

103.42 


1883 

108.99 


1884 

114.56 


1885 

120.13 


1886 

125.70 

Difference 

1886 

125.7 

10 

1896 

165.4 

39.7 



10 


= 5.57, change for each year from 1876 
to 1886. (Decade values are centered 
on middle of decade, 1876, 1886, 1896, 
etc.) 


= 3.97, change for each year from 1886 
to 1896 


By the same method the yearly trend values may be computed for 
each year up to 1941. This method is only an approximation be- 
cause it assumes that the curve changes at an even rate through- 
out the decade, although the rate of change continually changes 
from year to year. 

A more accurate method would be to substitute in the regression 
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equation successive values of X increased by .1 for each suc- 
cessive year of the decade as follows: 

WORKSHEET NO. 58 


Y = 191.625 4- 4.5X - 8.643X2 


1871 

Y = 

191.625 

+ 

(4.5 X 

- 4.0) - 

- 8.643 

X 

16 

1872 

Y - 

191.625 

+ 

(4.5 X 

- 3.9) - 

- 8.643 

X 

(3.9)2 

1873 

F == 

191.625 

+ 

(4.5 X 

- 3.8) - 

- 8.643 

X 

(3.8)2 

1874 

F - 

191.625 

+ 

(4.5 X 

- 3.7) - 

- 8.643 

X 

(3.7)2 

1875 

F = 

191.625 

+ 

(4.5 X 

- 3.6) - 

- 8.643 

X 

(3.6)2 

1876 

F = 

191.625 

+ 

(4.5 X 

- 3.5) - 

- 8.643 

X 

(3.5)2 

1941 

F = 

191.625 

+ 

(4.5 X 

+ 4.0) - 

- 8.643 

X 

(4.0)2 


This method requires more work but is more accurate. 

By these same methods parabolas computed for annual data 
may be changed to monthly trend lines. The steps for the first 
method are: 

1. Compute the annual parabola for the monthly averages of 
the years. 

2. Divide the differences between the successive mid-year 
trend values by 12. 

3. Beginning with the July 1 value of the first year, add J of 
the quotient obtained in step No. 2, to obtain the mid- July value. 

4. Add the full quotient to the mid-July value to obtain the 
mid-August value and continue to add the full quotient to each 
preceding month to obtain the value for the following month 
till the mid- July of the second year is reached. 

5. Repeat the process for each of the following years included 
in the original trend. 

For the second and more accurate method of changing annual 
parabola trends to monthly trends, substitute successively the 
X value increased by ^ or .08 into the regression equation, as 1.00 
for July, 1.08 for Aug., 1.16 for Sept. etc. 
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RATIO TRENDS OR LOG-LINES 

Whenever the original data plotted on a semi-logarithmic scale 
fall in a straight line, it is best represented by a ratio or logarithmic 
trend. Computed by the short method the normal equations re- 
duce to 

Formula No. 66 Formula No. 67 

^ 2Z Log Y S Log F 

• 2 # ’‘-—IT- 

for the equation 

LogF = a + 6X 

This line fits quite well automobile production as plotted on 
Fig. 80 and computed for the years 1903 to 1917 inclusive. 


WORKSHEET NO. 59 


Logarithmic Trend Fitted to Automobile Production in U.S., 
1903-1917, Inclusive (Data in 1,000 op Cars) 


Year 

X 

r 

Log Y 

X2 

X Log F 

Trend in 
Logarithms 

Trend in 
Natural 

1 Numbers 

1903 

- 7 

10 

1.00000 

49 

- 7.00000 

1.042655 

1 11 

1904 

- 6 

23 

1.36173 

36 

- 8.17038 

1.201572 

16 

1905 

- 5 

24 

1.38021 

25 

- 6.90105 

1.360489 

23 

1906 

- 4 

29 

1.46240 

16 

- 5.84960 

1.519406 

32 

1907 

- 3 

37 

1.56820 

9 

- 4.70460 

1.678323 

48 

1908 

- 2 

58 

1.76343 

4 

- 3.52686 

1.837240 

- 69 

1909 

- 1 

115 

2.06070 

1 

- 2.06070 

1.996157 

99 

1910 

0 

170 

! 2.23045 



2.165074 

143 

1911 

+ 1 

168 

2.22531 

1 

-j- 2.22531 

2.313991 

206 

1912 

+ 2 

313 

2.49554 

4 

+ 4.99108 

2.472908 

297 

1913 

+ 3 

462 

2.66464 

9 

-h 7.99392 

2.631825 

428 

1914 

+ 4 

544 

2.73560 

16 

+ 10.94240 

2.790742 

618 

1915 

+ 5 

896 

2.95231 

25 

+ 14.76155 

2.949659 

890 

1916 

+ 6 

1,526 

3.18355 

36 

+ 19.10130 

3.108676 

1,284 

1917 

+ 7 

1,746 

3.24204 

49 

+ 22.69428 

3.267493 

1,851 


0 


32.32611 

280 

-f- 44.49665 
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SX Log Y _ 44.49665 
SX2 280“ 


= .158917 


S Log F _ 32.32611 
N 15 


2.155074 


Log F = a + = 2.155074 + .158917Z 


Automabile 
Production 
in 1,000 Cars 



Pig. 80. Automobile production in U.S., 1903-1917, and trend, 
Arithmetic Scale. (Standard and Poores) 


In all the worksheets and problems presented up to this point 
in the chapter the trends have been computed for annual data, 
and for the quantity production of commodities, automobiles, 
horses, etc. In dealing with quantities of commodities, the annual 
data are easy to manipulate by the methods already explained. 
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In dealing with prices, however, annual totals are meaningless 
and are better replaced by annual averages. In fact, there is no 
such thing as an annual total of wheat prices, egg prices, or other 
prices. The only meaningful figure we can have for the year is 
the annual average price. In the following worksheet the trend 
of egg prices at Chicago is fitted to the average price of eggs. 
This is a simple arithmetic mean of the monthly prices. 

WORKSHEET NO. 60 


Stkaight-Line Least Squares Trend for Wholesale Egg 
Prices, Chicago, 1931-1941 


Year 

X 

Y 

XY 

X2 

1931 

- 5 

20.2 

- 101.0 

25 

1932 

-4 

17.9 

- 71.6 

16 

1933 

- 3 

15.8 

- 47.4 

9 

1934 

~2 

19.3 

- 38.6 

4 

1935 

- 1 

25.3 

- 25.3 

1 

1936 

0 

24.2 



1937 

+ 1 

21,9 

+ 21.9 

1 

1938 

+ 2 

21.3 

+ 42.6 

4 

1939 

+ 3 

17.6 

+ 52.8 

9 

1940 

+ 4 

18.9 

+ 75.6 

16 

1941 

+ 5 

25.5 

+ 127.5 

25 


0 

227.9 

+ 320.4 
- 283.9 

110 




36.5 


2F 

isf 

227.9 
“ 11 “ 

20.72 

XXY 36.5 

^ ” 2 X 2 no 

= ,332 


Y == a A hX 

= 20.72 + .332X annual equation 
20.72 .332X 

^ “ 1 "^12 

Y = 20.72 + .028X monthly equation 

Since the original data form the basis for a monthly price, the 
only change required to reduce the annual trend to a monthly 
trend is to divide the B value by 12. It will be remembered that 
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when annual totals were used as in Worksheet No. 52, a had to 
be divided by 12 and b by 144. But when the annual average 
price or production is used, the annual totals have already been 
divided by 12 once in order to get the average. The monthly 
and annual a’s in such cases are identical. All that is necessary 
in such computations is to reduce the annual 6 to a monthly b by 
dividing the yearly b by 12. This is not a contradiction or in- 
validation of the annual totals method. It is simply completing 
a job that was half performed when the original annual totals 
were divided by 12 once to reduce them to annual averages. In 
computing trends for prices, it is preferable to use this method of 
annual averages. It is also just as applicable to production or 
quantity data. 

EXTRAPOLATION 

Trend lines may be extended or projected beyond the limits of 
the data on which they are based. For instance, a trend line 
based on data from 1931 to 1941 inclusive might be projected 
along the same curve for 1942 or even later years. Such a prac- 
tice is useful to the extent that the extended trend correctly 
forecasts the future movement of the series. The further the 
trend is extrapolated, the less likely is it a correct measure of the 
future development of the business. The further the trend ex- 
tends into the past the more likely is it correctly to measure the 
future, but this is true only if the same conditions hold. The 
trend of growth of Chicago can be projected more accurately 
than that of a new mining town. The size and nature of the 
business are important factors in any such extrapolation. 


SUMMARY 

1 . Secular trend is that variation in a time series that is due to long- 
time expansion or contraction, growth or decay. 

2. A secular trend line may be obtained by plotting the data on co- 
ordinate graph paper and then drawing through these plotted points a 
freehand line so that the area of the plotted points above the line is as 
nearly equal as possible to the area of the plotted points below the line. 

3. A secular trend line may be obtained by dividing the data into ap- 
proximately two equal parts, computing the means for each part and 
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drawing a straight line through these plotted averages on the coordinate 
scale on which the data have been plotted. 

4. A secular trend line may be obtained by means of a moving average. 
To secure a dependable trend by this method the number of time periods 
for which each average is taken should be as nearly as possible the length 
of the cyclical movements in the series. If the cycle is four years long, 
the moving averages should be four years long. The number of items 
in each average should be uniform throughout the period covered. 

5. The least squares method of securing a trend line is a mathematical 
method which locates the trend line so that the sum of the squared 
deviation of the items of data from the line are a minimum. 

6. A monthly least squares trend line can be secured with a minimum 
of time and effort by fitting the line to annual totals or averages and then 
shifting the line from the annual values to monthly values. Such a pro- 
cedure saves about nine-tenths of the work otherwise required. 

7. The standard error of estimate for time series data is not as de- 
pendable or valid as for random samples of spatial data, because its 
valid use is based on the relationships of the normal curve and random 
sampling neither of which can hold fully in a time series. The items of 
a time series are not free random items but are largely interdependent. 
The values of present periods are partially controlled and determined by 
the conditions of past periods. 

8. Extrapolation is the projection of a trend line beyond the limits of 
the actual data employed in the problem. Extrapolation is not depend- 
able in most cases when extended far beyond the data. a. The longer 
the period of past data, the longer the future period for which extrapola- 
tion is useful, b. The larger the area included in the measurements, the 
safer is the extrapolation. Extending the trend of growth for Chicago 
or New York would likely result in less error than extrapolating the 
trend of a village or new mining camp. 
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REVIEW QUESTIONS 

1. Define secular trend and differentiate it from other time series 
fluctuations. 

2. What is the difference between the arithmetic and logarithmic 
scales? 

3. What are the limitations and weaknesses of a moving average 
trend? 

4. Under what conditions may two or more short straight-line trends 
be used on a single time series? 

5. What condition and device permit the shortening of the normal 
least squares equations for time series? Explain fully. 

6. Write the original normal equations and the shortened time series 
normal equations for a parabola. 

7. How may an annual trend line be changed into a monthly trend 
line? 

8. Why is the annual total b divided by 144 to reduce it to a monthly h? 

9. What change is made in the worksheet when an even number of 
years is used instead of an odd number? 

10. What type of data is preferable when annual trend lines are fitted 
to prices? 

11. How may an annual parabola be changed to a monthly parabola? 

12. What should be the characteristic of data for which a Log-line is 
used? 

13. What is meant by extrapolating a trend line? Under what con- 
ditions is it desirable? 

14. Name three principal uses of trend lines and explain each use. 

EXERCISES 

1. Compute trends for the data in the three exercises at the end of 
Chapter 12. 

2. Compute the trend for the number of banks in the United States. 

3. Compute the trend for the number of airplanes exported from the 
data given in Chapter 11. 

4. Compute the trend of some local business, store, factory, or service 
in your city. 



CHAPTER 17 


SEASONAL VARIATION 


As defined in the previous chapter, seasonal variation is the 
month to month change in a time series due to the time of 
year. Since the principal factors causing seasonal change are 
relatively uniform and permanent, this movement may be con- 
sidered as entirely normal. Summer and winter, spring and 
autumn, the rainy season and the dry season, planting and reaping 
and their effects on customs and social activities result in about 
the same month-to-month change, year after year. More ice is 
sold in the summer, and more coal for heating homes in the win- 
ter. The price of strawberries and wheat are lowest in the month 
of principal harvest and are higher in months when the supply is 
less. Seasonal variation is the simplest form of time series change, 
and is the easiest to observe and measure. 

Before the various methods of measuring seasonal changes are 
presented, some preliminary considerations for accurate meas- 
urement must be considered. The first point is the imperfections 
of the calendar as a device for measuring business time. The 
months are of unequal length. February is ordinarily only 89.3% 
as long as January or March. If total business for February is 
less than its two companion months, it by no means indicates a 
seasonal decline unless it falls to a figure less than 89% of the 
January total. The volume of business per day in this short 
month might be as large or larger than the daily volume of Janu- 
ary and the monthly total still be less than that of the previous 
month. In certain lines of business, freight car loadings and 
banking, February is still further handicapped because it con- 
tains two common holidays, Washington’s birthday and Lincoln’s 
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birthday. For all businesses affected by Sundays and legal holi- 
days, there are really only 22 business days in February as com- 
pared with 26 or 27 in both January and March. Any seasonal 
computation to be accurate must take into consideration differ- 
ences in the length of months. 

Not only are the months of different length and unequally 
affected by holidays, but they also vary from year to year in the 
number of Sundays and holidays per month. Easter is a shifting 
holiday of special significance for business. It sometimes falls in 
March but more often in April. If Easter is in March and that 
same year March has five Sundays, it will have but 26 business 
days while April that same year will have 26 business days. If 
the next year Easter is in April and this year April has five Sun- 
days, it will have only 25 business days while March has 27, or a 
difference of 4%. 

A third point to be predetermined is the units in which the sea- 
sonal index is computed. Should it be measured as a weekly or 
monthly change? Ordinarily the amount of seasonal change 
from one week to the next is so small that it is almost impercep- 
tible. In most parts of the temperate zone the see-saw, or uneven 
fluctuations of the weather, would make the week an undesirable 
unit for seasonal measure. The average temperature of April 
is always higher than that of March for the same locality, but the 
second week in April may be much colder than the third week of 
March. The longer period of the month averages out this uneven 
fluctuation of weather to a large degree with its consequent ef- 
fects on business. It is usually agreed, therefore, that the month 
is the correct time unit for measuring seasonal variation. 


AMPLITUDE AND UNIFORMITY 

By amplitude is meant the range of seasonal variation from the 
lowest to the highest month. In some series the amplitude is 
quite small, as for instance the sale of table salt. Approximately 
the same amount is consumed in each month of the year. If we 
take the average monthly consumption as 100, we find that the 
lowest month is only a little below 100 while the highest month is 
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very little above 100. In some other series, as for instance the 
canning of tomatoes or peas, the highest month may be three or 
four hundred percent of the annual average while the lowest 
month is zero. Most agricultural and retail merchandising series 
have a seasonal variation of wide amplitude. Series representing 
the consumption of daily necessities, such as bread, milk, tobaccos, 
shoes, and the like, have a narrow seasonal amplitude. 

A point of large importance in the computation of seasonal 
variation is the regularity of the seasonal movement from year to 
year. After the inequalities in the length of months within one 
year and between years have been eliminated by proper adjust- 
ments, the question remains. Do the highs and lows of succeeding 
years fall at approximately the same points of time? If they do, 
a uniform seasonal variation is easily computed. If the low and 
high points shift about from year to year, more than one seasonal 
index may be necessary. This point is illustrated by the seasonal 
variation of prices in certain agricultural products, especially 
cotton. The seasonal price movement of a small crop after a 
large crop is often the opposite of that of a large crop after a 
large crop. Other combinations of this variation are frequent. 
In computing a seasonal index the student should note whether 
the movement is regular from year to year or highly variable. 


FIVE METHODS OF COMPUTING SEASONAL VARIATION 

In this text the five following methods of computing seasonal 
variation are presented: 

1. Monthly Medians of Original Data 

2. Monthly Percentages of Each Yearns Data 

3. Deviation from Least Squares Trend . 

4. Link Relative Method 

5. Deviations from 12-Month Moving Average 

The first two methods are the easiest to compute but are the 
least accurate. The last three methods require much more time 
and labor to compute but are much more accurate. 
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First Method 

Monthly Medians of Original Data 

Steps in Computation: 

1. Arrange the original data in an array. This step facilitates 
the computation when several years are used. 

2. Select the median value for the average for each month. 

3. Total the twelve monthly medians. For No. 1 Index in 
Worksheet No. 61 based on 5 years, 1930-34, inclusive, the total 
of the twelve median values is 2054.8. 

4. Divide the total of the twelve monthly medians by 12 to 
get the monthly average. For No. 1 Index in Worksheet No. 61 
this computation is 2054.8 ^ 12 = 171.23. 

5. Reduce the twelve monthly medians to percentages by 
dividing each one by the monthly average, as 

112,8 171.23 - 65.8, and 180.4 - 171.23 - 105.4, etc. 

WORKSHEET NO. 61 

Monthly Passenger Car Production in the United States 
(in 1,000’s) * 


Years 

Jan, 

Feb. 

Mar. 

Apr. 

May 

June 

July 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

1930 

233.8 

280.0 

330 9 

372 9 

361 4 

285.9 

222 2 

183.9 

176.4 

113.8 

101 1 

122 3 

1931 

138 3 

180 4 

231 2 

286 9 

271.5 

210.4 

484.2 

155 4 

109.2 

58.4 

49.2 

97 9 

1932 

98.8 

94 1 

99 4 

120 9 

157.8 

160.3 

94 7 

75 9 

64.7 

35.1 

47.5 

86 1 

1933 

109.8 

90 1 

97.5 

149 8 

180.7 

207.6 

191.3 

191 4 

157.4 

104.9 

42 4 

50 8 

1934 

112 8 

186 8 

279 3 

288.4 

273.8 

261.3 

223 1 

183 5 

125.0 

84.0 

49.0 

111.1 

1936 

227.6 

273 6 

359.4 

387 2 

305.5 

294.2 

274 3 

181 1 

56 1 

213 3 

336 9 

343.0 

1936 

297.7 

224 2 

343.9 

410.4 

384.9 

375.3 

371.9 

209 4 

90.1 

190 2 

341.1 

425.4 

1937 

309 5 

296.8 

403.9 

440 0 

425 4 

411.4 

360.4 

311.6 

118,7 

298 7 

295.2 

244 4 

1938 

155.5 

139 4 

174.1 

176.1 

165 0 

136.5 

106.8 

68.6 

65 2 

187 5 

320.3 

326 0 

1939 

281 5 

243.0 

299.7 

273.4 

237 9 

246.7 

150.7 

61.4 

161 6 

251 8 

285 3 

373.8 

1940 

363.0 

337.8 

352.9 

362.7 

325 7 

286 0 

168.8 

46.8 

224.5 

421.3 

407 1 

396.5 

1941 

411.2 

394.6 

410.2 

375 0 

417.7 

419.0 

343.7 

78.5 

167.8 

295 6 

266 1 

175 0 


No, 1 Index Based on 5 Years 1930”34, Inclusive 


Medians 

112 8 

180 4 

231 2 

286 9 

271 5 

210 4 

222.2 

183.5 

125.0 

84 0 

49 0 

97.9 

Index 

65.8 

105 4 

136 0 

167 6 

158 6 

122 8 

129.8 

107.2 

73.0 

49 0 

28 6 

57 2 


No. 2 Index Based on 10 Years 1930-39, Inclusive 


Medians 

Index 

191.6 

90.0 

205 6 
96.7 

289 5 
136.2 

287.6 
135 2 

272 6 
128.2 

254 0 
119 5 

222 6 
104.6 

182 3 
86.7 

114 0 
53.6 

150.6 
70 8 

198 2 
93 2 

183 4 
86 3 




No. 3 Index Based on 4 Years 1938-41 

. Inclusive 




Medians 

Index 

322.2 

124.2 

290 4 
111.9 

326.3 
125 7 

318 0 
122.5 

281 8 
108.6 

264.3 
101 8 

159 7 
61.5 

60 0 
23 2 

164 7 
63.6 

273 7 
105 5 

302 8 
116.7 

349 9 
134 8 


Standard and Poor’s Trade and Securities, Basic Units, G. 13. 
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Index for 5 Years. 1930-34 \ / 

40 - New Models in December. / 

A fairly correct measure of seasonal, 
for the period when new models where introduced 
In December. Not good for later periods. 


Index for 10 Years. 1930-39 
40 ■* Introduction of New Models changed from 

December to October. This index Is a composite 
for two periods and Is of no value for either. 


Worksheet No. 61 and Fig. 81 are clear demonstrations of the 
results that follow the mixing of heterogeneous data. During 
the period 1930 to 1934 in which the new models were introduced 
in December, the lowest point in the seasonal index naturally 

falls in November. Since the old 

models were going out of produc- ^go ‘J’a's'o'n'd 

tion and purchasers were waiting i 40 - / \ ^ 

for new ones, November, pre- 120 - / 

vious to 1935, was the month of V 

least activities. Index No. 1, ^ \ 

, , ,11 / ou - 5 1930-34 \ / 

based on the homogeneous data 40 - New Models m December. \ / 

„ . • • , ^ correct measure of seasonal, 

01 those years, is Quite accurate. ^or the period when new models where introduced 

,,, ,, in December. Mot good for later periods. 

After 1935 the new models 
were introduced in September- 

October. This change radically 100 — / — 

altered the seasonal index of so - \ 

automobile ' production. After " index for 10 Years. 1930.39 \/ 

,1 . 1 , ,1 i j 1 ^ , 40 “ introduction of New Models changed from 

this aate the Ola models went December to October. This Index Is a composite 

If. 1 , • 1 m i 1 for two periods and Is of no value for either. 

out 01 production by September jgg ::: 

and purchasers delayed bu3ring 140 - 

new cars until the new models ^^0 

were out. This change made >7 y 

August the low month of the gg _ \ / 

year and December the high 40 -New Models introduced m \ / 

1 mi •• 1*1 • October. This is a fairly 

month. IhlS index is shown in -correct index for this period, but wouW not 

_ , . . ^ be correct for the earlier period. I 

No. 3 and is quite accurate lor 

the period after 1935. Changes in seasonal 

T- 1 „ . variation of automobile produc- 

Index No. 2 is wholly maccu- in US 

rate because it is based on mixed 

data. It includes ten years, five years of the earlier period, 1930- 
34, mixed with five years of the later period, 1935-39. Although 
Index No. 2 is based on ten years, it is of no value at all. For 
any seasonal index to be correct it must be based on homogeneous 
data. It is better to use a few years that will give a correct result 
than a larger number that no longer represents the production 
program. 

This problem of changing seasonal variation is rising in many 


Index for 4 Years. 1938-41 \ / | 

40 - New Models introduced in \ / I 

October. This is a fairly i 

- correct index for this period, but would not | 
be correct for the earlier period. I 

Fio. 81. Changes in seasonal 



400 


SEASONAL VARIATION 


fields of business. When an oil field is opened in an agricultural 
community, or a coal mine is worked out, or a new factory is 
opened in a small town, there is a change in the seasonal variation 
of retail sales. 

Second Method 

Monthly Medians of Percentages of Original Data Adjusted to 

Equal 1200 


WORKSHEET NO. 62 

Monthly Percentages of Automobile Production Based 
ON Annual Average as 100, Years, 1938-41 


Year 

Jan. 

Feb. 

Mar. 

Apr. 

May 

June 

July 

Aug. 

Sept. 

Oct 

Nov. 

Dec. 

1938 

93.2 

83.5 

104.4 

105 6 

92 9 

91 8 

64.0 

35.1 

39 1 

112.4 

192 1 

195 5 

1939 

117.8 

101 7 

125 4 

114.4 

99 6 

103 3 

63 1 

26.7 

67.6 

105 4 

119.4 

158 5 

1940 

117 9 

109.8 

114 7 

117.9 

105.8 

92 9 

54 8 

15.2 

72.9 

136 9 

132 3 

128 8 

1941 

131.4 

126.1 

131.1 

119.9 

133 5 

133 9 

109 9 

25.1 

53.6 

94 4 

85 1 

55 9 

Medians 

117.8 

105.8 

120 0 

116 2 

102.7 

98.1 

63 6 

25.4 

60 6 

108 9 

125 9 

143.7 

Index 

118.9 

106.8 

121.1 

117 3 

103.7 

99.1 

64.2 

25 7 

61.1 

110.0 . 

• 127.1 

145 0 


The second method of computing a seasonal index involves the 
following steps: 

1. The monthly values of the original data for each year sepa- 
rately are reduced to percentages by dividing each monthly value 
by the monthly average of that year. The percentages in Work- 
sheet No. 62 were obtained from the automobile data for the years 
1938-1941 inclusive, in Worksheet No. 61. The total of the 
data for 1938 is 2,001, which, divided by 12, gives a monthly 
average of 166.75. Dividing the data for each month of 1938 by 
166.75 gives the percentages in the first line of Worksheet No. 62. 
The procedure of the other three years is identical with this. 

2. Take the median percentage value for the percentages of 
each month. Since there are only four percentages for each 
month in this case, the median value is the average of the two 
middle values in size. 

3. Adjust the total of these twelve median percentage values to 
total 1200. This result is obtained by dividing the total of the 
12 median values by 12 to obtain the monthly average of the 
medians, and then dividing this average back into each percentage 
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1 7 

median. The total in this case is 1,1887. — = 99.06. 

Dividing each one of the percentage medians by 99.06 gives the 
Index which totals 1200 for the 12 months and averages 100 for 
each month. 

This method gives an Index slightly different but closely re- 
sembling Index No. 3 in Worksheet No. 61. It is less likely to be 
biased by erratic original data, since each year is reduced to per- 
centages with a separate base. It requires a longer time for 
computation, but is more likely to approximate the true seasonal 
variation than the First Method. Like the First Method it does 
not effectively eliminate the influence of cycle or trend. Both of 
these methods are designed for quick, easy results rather than a 
high degree of accuracy. If designed and computed with rea- 
sonable care, they are both dependable in cases in which the 
secular trend is small. 


Third Method 

Deviation from Least Squares Trend 
WORKSHEET NO. 63 

Seasonal Index Based on Percentage Deviations of Original 
Data from Least Squares Trend of Automobile 
Production, 1938-41 


Years 

Jan. 

Feb. 

Mar. 

Apr. 

May 

June 

July 

Aug 

Sept. 

Oct. 

Nov, 

Dec. 

1 

Data 

155 5 

139.4 

174 1 

176 1 

155 0 

136 5 

106 8 

58.0 

65.2 

187.5 

320 3 

326.0 

9 

Trend 

274.9 

275.0 

275 2 

275.3 

275.4 

275 6 

275 7 

275.8 

276 0 

276.1 

276 2 

276 3 

3 

8 

% 

56.6 

50-7 

63.3 

64.0 

56.3 

49.5 

38.7 

21.2 

23.6 

67.9 

116.0 

118 0 

1 


281.5 

243.0 

299.7 

273.4 

237,9 

246.7 

150.7 

61.4 

161 6 

251.8 

285 3 

373 8 

9 

Trend 

276 5 

276 6 

276.7 

276 9 

277 0 

277 1 

277 2 

277.4 

277 5 

277 6 

277.8 

277 9 

3 

9 

% 

101 8 

87.9 

108 3 

98 8 

85.9 

89 0 

54.4 

22.1 

58 2 

90 7 

102.7 

134 5 

1 


363.0 

337.8 

352 9 

362.7 

325 7 

286 0 

168 8 

46 8 

224 5 

421.3 

4071 

396 5 

9 


278 0 

278.2 

278 3 

278 4 

278 6 

278.7 

278 8 

278.9 

279.1 

279.2 

279 3 

279 5 

4 

0 

% 

130.6 

121 4 

126.8 

130.2 

116.9 

102.6 

60 5 

16.8 

80 5 

150.9 

145 7 

141 9 

1 

Data 

411 2 

394.5 

410 2 

375 0 

417 7 

419 0 

343 7 

78.5 

167 8 

295 6 

266 1 

175 0 

9 

Trend 

279 6 

279 7 

279.8 

280 0 

280 1 

280 2 

280 4 

280 5 

280 6 

280 8 

280 9 

281 0 

4 

1 

% 

147.1 

141.0 

146 6 

133.9 

147.1 

149 5 

122 6 

28 0 

59.8 

105.3 

94 7 

63.3 

Percentage 
Medians 116.2 
Index 124.3 

104.7 
112 0 

117 6 
125.8 

114.5 

122.5 

101 4 
108.4 

95.8 

102.5 

57 5 
61.4 

21.7 
23 2 

59 0 
63.1 

98 0 109.4 

104 8/ 117 0 

126.2 

135.0 
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Steps in computation of Third Method: 


1. Divide the monthly data for each month by the least squares 
trend value for that month. This operation gives a series of 
percentages showing the relation of data to trend. 

2. Locate the median percentage values for each month. 

3. Total the twelve median percentage values and divide by 
12 to secure the average median percentage value. In this case 

the figures are — = 93.008. 


4. Divide the average median percentage into each one of the 
monthly medians. E.g., 


115.6 

93^008 


124.3, 


104.1 ... n. 125.6 
93.008 93.008 


135. 


The result is an index of seasonal change which totals 1200 and 
averages 100 per month. 

The method of computing seasonal index explained in Work- 
sheet No. 63 has the following advantages: 

1. It eliminates all or much of trend from the seasonal index. 

2. If from six to ten years of data may be used, it also elim- 
inates a large part of the cyclical fluctuations through the can- 
celing out process. 

3. If one is required to compute or finds it advantageous for 
other purposes to compute a least squares trend, the seasonal 
index may be obtained by this method with very little additional 
labor. 


Fourth Method 

The Link Relative Method of Computing Seasonal Index 

This method is designed to eliminate trend, cycle, and accidental 
elements from the seasonal and give a pure measure of month- 
to-month change in a time series. It eliminates trend by the 
reduction of the original data to link relatives. These relatives 
are obtained by dividing the original data of each month by the 
original data of the preceding month. For the automobile data of 
years 1938, 1939, 1940, 1941, the computations are as follows: 
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Jan., 1938 155.5 _ 

Dec., 1937 244.4 

Feb., 1938 139.4 _ 

Jan., 1938 155.5 

Mar., 1938 174.1 

Feb., 1938 139.4 


Dec., 1941 175.0 
Nov., 1941 266.1 


As these percentages are computed they are recorded in Work- 
sheet No. 64 below under the appropriate year and month. 


WORKSHEET NO. 64 

Computation of Link Relative Seasonal Index for Automobile 
Production in United States, 1938-41 


Year 

Jan 

Feb 

Mar 

Apr 

May 

June 

July 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

Jan. 

1938 

63 6 

89.6 

121 9 

111 1 

88.0 

88.0 

78 2 

54 8 

111.3 

287 6 

170.8 

101.8 


1939 

86 3 

86.3 

123 3 

91.2 

87.0 

123.7 

61.1 

40.7 

263.2 

155 8 

113.3 

131 0 


1940 

97.1 

93.1 

104 5 

102 8 

89.8 

87.8 

59 0 

27.7 

479 7 

187 6 

96.6 

97 4 


1941 

103.7 

95.9 

104.0 

91.4 

111.4 

100 3 

82 0 

22 8 

213 8 

176.2 

90.0 

65.4 


Medians 

Chain 

91.7 

91.4 

113.2 

96.2 

88.9 

94.2 

69 7 

34 2 

238.5 

1819 

105 0 

99 6 

91 7 

Relative 

100 

91.4 

103.5 

99.6 

88.5 

83.4 

58.1 

19 9 

47.5 

86.4 

90.7 

90.3 

82.8 

Correction 


1.4 

2.8 

4.3 

5.6 

7 1 

8.6 

10 1 

11.5 

13 0 

14.4 

15.8 

17.2 

Correct 

Chain 

Relative 

100 

92.8 

106.3 

103.9 

94.1 

90.5 

66.7 

30 0 

59.0 

99.4 

105.1 

106 1 

100 

Index 

113.9 

105.8 

121.0 

118.3 

107.4 

103-0 

76.0 

34 2 

67.2 

113 2 

119.8 

120.2 



Steps in computing Link Relative Seasonal Index: 

1. Compute the link relatives 

2. Obtain the monthly median of the link relatives 

3. Consolidate the link relatives into chain relatives as follows: 

Setting January as 100%, multiply the February median link 
relative, 91.4, by 100% and write the result in the chain relative 
line under February. Median 91.4 X 100% = 91.4 = first chain 
relative. Then multiply the March link relative, 113.2, by the 
first chain relative, 91.4 (113.2 X 91.4 = 103.5), which is the 
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second chain relative. Continue this process by months until the 
second January is reached. Since the first January median, 91.7, 
was not used at the beginning of the chain, it is transferred to the 
second January. (All January medians in a given set of data are 
identical.) Multiply this second January median, 91.7, by the 
December chain relative, 90.3 (91.7 X 90.3 = 82.8). This gives a 
second January chain relative of 82.8 although the first January 
was set as 100. This difference between the two January chain 
relatives indicates that there is a bias or trend in the data of 17.2 
(100.0 - 82.8 = 17.2). 

4. Since the January value in the completed index must be 
identical for alh Januarys, this bias must be removed. This is 
done by raising or lowering the whole list of chain relatives in 
proportion as follows: 


1. Divide 


100 - 82.8 
12 


17.2 

12 


1.43, monthly correction 


2. Add multiples of this correction factor to the chain rela- 
tive of each month beginning with February, as follows: 
1 X 1.43, or 1.42 to February; 2X 1.43, or 2.8 to March; 
3 X 1.43, or 4.3 to April, and so on till 12 X 1.43, or 17.2 is 
added to the second January making it 100. If in any case 
the second January is higher than the first January, or 100, 
the correction figure would be subtracted in multiples from 
the successive months, till the second January also equals 
100 . 


5. The corrected chain relatives are then adjusted to equal 
1200 and average 100 in the final Index. This is done by summing 
the 12 corrected chain relatives from the first January to De- 
cember inclusive, dividing this total by 12, and then dividing 
that quotient into each of the original corrected chain relatives. 
In this case, the figures are: 

100.0 -f 92.8 + 106.3 H + 106.1 = 1,053.9 • = 87.825. 

1 Theoretically and practically, the geometric method is the correct method 
for correcting the bias of the link relative seasonal index. It is so long anxi 
complicated that the arithmetic method shown above is frequently used as 
a shorter approximation and ordinarily gives a sufficiently accurate result. 
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Corrected Index 100.0 x 

87 825 II 0.9 lor j8/iiu8)ry 

= 105.8 for February . . . 

= 120.2 for December 

Evaluation of the Link Relative Method 

The link relative method is long and tedious. Theoretically, 
the link relative in which the percentage of seasonal change for 
each month is computed with the value of the previous month 
as a base of 100 tends to eliminate secular trend. The correc- 
tion for the bias in the chain relatives tends to eliminate the 
effects of cycle and accidental factors. With all these refinements 
it is still probable that the elements of error in the representa- 
tiveness of medians does result in further error through the 
process of multiplication in consolidating link relatives into 
chain relatives, unless the sample contains eight or ten years of 
data. With proper experience and care in the selection of data 
for the index, this method gives dependable results. It is too 
long for any except careful scientific studies. 

Fifth Method 

Moving Average Method of Computing Seasonal Index 

This method is approximately as long as the Fourth Method, 
but is theoretically more simple and accurate. It is similar to the 
Third Method in that it is a trend procedure. The only difference 
is that instead of the trend being based on the somewhat stiff 
and artificial least squares method, it rests on the more flexible 
moving average. The moving average has the quality of weaving 
up and down as it passes through the data eliminating most or 
all of the cycle as well as the trend. If the seasonal variation of a 
time series has any considerable degree of regularity, a twelve- 
month moving average should eliminate such variation from all 
the other factors of the series. A moving average may be de- 
fined as an average of a fixed number of items in a time series 
which moves through the series by dropping the top item of the 
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previous averaged group and adding the next item below in each 
successive average. In the preceding worksheet; 12 items are 
averaged at one time. In the first case the values of the 12 months 
of 1938 are totaled and the total, 2,001.0, is written between June 
and July. For the next step, the January, 1938, value, 155.5, is 
dropped and the January, 1939, value, 281.6, is added to total 
2,127.0. Then the 1938 February value, 139.4, is dropped and 
the February, 1939, value, 243.0 is added. Each time 12 items 
are averaged, but the items drop one month lower each time. 

The purpose of column No. 4 in Worksheet No. 65 is to center 
the moving average (or total) in the middle of the month. Since 
there are 12 months in a year, if the moving average is placed 
wiere it should be, that is, in the exact center of the period 
averaged, it will fall between June and July or at July 1, as is 
shown in Worksheet No. 65. The 12-month moving totals are all 
written in the exact center of the period averaged or between the 
months. If there were 13 or any odd number of months in the 
year, the middle of the year would automatically fall in the middle 
of a month. Since, however, the year has an even number of 
months, six months always fall on either side of the moving av- 
erage. If we add the two adjacent 12-month moving totals, one 
located at the first of July and the next at the first of August, the 
combined total (and average) will be centered halfway between 
the ends or in the middle of the month. The only purpose of 
column No. 4 in Worksheet No. 65 is so to center the total and 
average. The need for this step is the necessity to center the 
value of any time period in the center of that period. This same 
result may be obtained by running a 13-month weighted moving 
total in the first computation and eliminating column No. 3 from 
Worksheet No. 65 altogether. This 13-month moving total 
method requires a somewhat greater mastery of the calculating 
machine and a high degree of concentration and accuracy. It is 
a shorter and better method for the good student, but it is likely 
to prove longer and more difficult for the poor student because 
of the time lost through errors made. 

The method used in column No. 3 of Worksheet No. 66 is 
shorter and better for all students sufiiciently skilled to use it. 
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WOEKSHEET NO. 66 

Short Method for Computing 12-Month Moving Average 
FROM A Centered 24-Month Moving Total 

1 2 3 


12-Montli Total 
Centered Between 
June and July 

12-Month Total 
Centered Between 
July and August 

Short Method, 13-Month 
Weighted Moving Total 
Centered on Mid-July 
(1 + 2) 

Jan. 

155.5 



Jan. 

155.5 

Feb. 

139.4 

Feb, 

139.4 

Feb. 

139.4 + 139.4 

Mar. 

174.1 

Mar. 

174.1 

Mar. 

174.1 + 174.1 

Apr. 

176.1 

Apr. 

176.1 

Apr. 

176.1 + 176.1 

May 

155.0 

May 

155.0 

May 

155.0 + 155.0 

June 

136.5 

June 

136.5 

June 

136.5 + 136.5 

July 

106.8 

July 

106.8 

July 

106.8 + 106.8 

Jlug. 

58.6 

Aug. 

58.6 

Aug. 

58.6 + 58.6 

Sept. 

65.2 

Sept. 

65.2 

Sept. 

65.2 + 65.2 

Oct. 

187.5 

Oct. 

187.5 

Oct. 

187.5 + 187.5 

Nov. 

320.3 

Nov. 

320.3 

Nov. 

320.3 + 320.3 

Dec. 

326.0 

Dec. 

326.0 

Dec. 

326.0 + 326.0 



Jan. 

281.5 

Jan. 

281.5 


2,001.0 


2,127.0 


4,128.0 


because it eliminates the work in column No. 3 in Worksheet 
No. 65. For the beginning student, however, who lacks accuracy, 
speed, and skill with the calculating machines, the longer method 
will be easier. He may begin with the method in Worksheet 
No. 65 and later shift to the method in Worksheet No. 66. 

The percentages in column No. 6 of Worksheet No. 65 are 
transferred to the appropriate monthly column in array order in 
Worksheet No. 67. The monthly medians are taken and the 
Index adjusted to total 1200 and to average 100 as is done in ail 
cases. 

It will be recalled that earlier in this chapter, reference was 
made to the effects of the inequality of the length of the months 
on all seasonal indexes. February ordinarily has 28 days; April, 
June, September, and November, 30; and all the other months 
have 31 days. In none of the indexes computed so far in this 
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WORKSHEET NO, 67 

Computation of Seasonal Index of Automobile Production, 
1938-1941, FROM Percentage Deviations from 12-Month 
Moving Averages Centered 



Jan 

Feb. 

Mar 

Apr 

May 

June 

July 

Aug, 

Sept 

Oct 

Nov 

Dec 

1 

118 0 

108 3 

115 3 

107 7 

100 6 

93 2 

54.5 

14 9 

34 1 

93 6 

107 7 

118 2 


126.6 

no 5 

126 5 

116 2 

107 8 

104 1 

62 1 

24 6 

63 2 

96.2 

124 8 

136 2 


131 1 

122 0 

131 2 

125 6 

123.9 

124 7 

62 2 

32 3 

70 4 

130 9 

154 1 

150 9 

Medians 

126 6 

no 5 

126 5 

116 2 

107.8 

104 1 

62 1 

24 6 

63 2 

96 2 

124 8 

136 2 

Index 

126 7 

110.6 

126.6 

116 3 

107.9 

104 2 

62.2 

24.7 

63.3 

96.3 

124.9 

136.3 

Corrected 

Medians 

126 6 

123 7 

126 5 

120 2 

107 8 

107 7 

62 1 

25.4 

65.4 

96.2 

129 1 

136 2 

Index 

123 8 

121.0 

123 7 

117.5 

105 4 

105.3 

60.7 

24 8 

64.2 

94.2 

126 2 

133 2 


chapter has any adjustment been made for the varying length of 
months. In the two bottom lines of Worksheet No. 67 such an 
adjustment is made. The adjustment was made by dividing the 
February median, 110.5, by 89.3 which is the percentage length of 
February as compared with January as 100, and by dividing the 
April, June, September, aiid November medians by 96.7, which 
is the relative length of each of these months as compared with 
January as 100. No change was made in the medians for January, 
March, May, July, August, October, or December. This simple 
computation roughly adjusts the months to the same relative 
length. The last index in Worksheet No. 67 is such an adjusted 
index. In it we see that February is practically as high as Janu- 
ary and March, which seems reasonable in view of all known facts 
concerning automobile production during this quarter of the 
year. In this correction for length of month, we have assumed 
that Sundays and holidays were evenly distributed among the 
months. 

WORKSHEET NO. 68 

Comparison of Six Indexes op Seasonal Variation Computed 
BY Various Methods in this Chapter for 
Automobile Data, 1938-41 


Index 


First 

Second 

Third 

Fourth 

Filth 

Corrected 

for Time 


Jan 

Feb. 

Mar 

Apr. 

May 

June 

July 

Aug. 

Sept. 

Oct 


Nov 

Dec. 

124 2 

Ill 9 

125 

7 

122 5 

108 6 

101.8 

61 

5 

23 2 

63 6 

105 

5 

116.7 

134 8 

118 9 

106.8 

121 

1 

1173 

103.7 

99 1 

64 

2 

25 7 

61 1 

no 

0 

127 1 

145 0 

124 3 

112 0 

125 

8 

122.5 

108 4 

102.5 

61 

4 

23 2 

63 1 

104 

8 

117 0 

135 0 

113.9 

105 8 

121 

0 

118 3 

107 4 

103.0 

76 

0 

34 2 

67 2 

113 

2 

119 8 

120 2 

126.7 

no 6 

126 

6 

116 3 

107.9 

104.2 

62 

2 

24 7 

63 3 

96 

3 

124 9 

136 3 

123.8 

121 0 

123 

7 

117.5 

105 4 

105 3 

60 

7 

24 8 

64 2 

94 

2 

126 2 

133.2 
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Fig. 82. Comparison of six indices 
of seasonal variation for automobile 
production computed by six dif- 
ferent methods.^ 



This method eliminates one column from the worksheet. Work- 
sheet No. 69 has six columns instead of the five in this case. This 
is the best method for ^veil-trained machine operators. 

WORKSHEET NO. 70 

Computation of Seasonal Index for Wholesale Egg Prices 
AT Chicago, 1936-41, from Percentage Deviations 
FROM 12-Month Centered Moving Average 
FROM Worksheet No. 69 



Jan 

Feb. 

Mar. 

Apr. 

May 

June 

July 

Aug 

Sept. 

Oct. 

Nov. 

Dec. 


116 2 

119.0 

96 2 

94 3 

93 3 

102.4 

96 7 

100 5 

Ill 0 

121 6 

139 0 

129 9 


101 9 

91 2 

91.1 

94.0 

92 8 

91 0 

91 7 

94 6 

106 6 

114 2 

132 6 

125 1 


97 1 

85 5 

88 8 

90 1 

88 5 

87 9 

89 0 

93 5 

105 2 

112 9 

132 5 

122 8 


91 4 

82 0 

84 5 

90 0 

88.3 

86 4 

88 4 

89.5 

104 2 

111 5 

125 5 

118 5 


90 6 

78 8 

80.5 

85 6 

85 7 

85 7 

85 9 

88.0 

102 2 

106 8 

124 9 

107 3 

Medians 

97.1 

85 5 

88.8 

90 1 

88 5 

87.9 

89 0 

93 5 

105 2 

112 9 

132 5 

122.8 

Index 

97 6 

86.0 

89.3 

90 5 

89.2 

88.5 

89 5 

94.0 

105.7 

113 4 

133 0 

123.3 


^ In Fig. 82, the vertical scale is shown in full for only the bottom index 
on the graph. The center line of 100% is shown for the other five indexes^ 
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Steps in computation of Index: 

1. Place the percentages of column No. 5 in Worksheet No. 69 
in an array under each month in Worksheet No. 70. 

2. Select the median percentage. 

3. Total the 12 median percentages. In this case the total is 
1,193.8. 

4. Divide total of median percentages by 12. 

1,193.8 12 == 99.5, the correction value. 

5. Divide each of the monthly median percentages by 99.5. 
The result is a seasonal variation that averages 100 and totals 
1 , 200 . 

SUMMARY 

1. Seasonal variation is usually computed as a month to month change 
expressed as a percentage with the average for the years as the base of 
100 %, 

2. The simplest method of computing a seasonal index is to reduce 
the twelve totals for the twelve months for the original data to percent- 
ages of their mean as a base of 100%. Such a result will include a large 
error if the secular trend is quite steep or the cycle quite violent. Other- 
wise the results obtained by this simple method are dependable. 

3. More exact results are obtained from each of the three methods: 
(1) deviations from the least squares trend, (2) link relative method, or 
(3) deviations from 12-month moving average. The latter method is 
usually superior to the other two because it largely eliminates the in- 
fluence of both secular trend and cyclical fluctuation from the seasonal 
index. 

4. A seasonal index to be dependable should be computed from data 
which are as uniform as possible. If there has been an organic or per- 
manent change in the nature of the seasonal variation in the recent past, 
it IS still better to use a few years which truly represent the present 
condition than to include a longer period of past years which are no longer 
representative of the variation. 

5. The seasonal index is easily understood and widely used by business- 
men in preparing their monthly budgets, inventories, and production 
activities. 


Since the purpose of the graph is to show only the comparative shape of the 
several seasonal indexes and an inclusion of a full scale for all five would lead 
to over-lapping and confusion, they are omitted. All six indexes are, however^ 
plotted on the same scale, that which is shown in the last index. 
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REVIEW QUESTIONS 


1. What are the principal characteristics of seasonal variation? 

2. What are the chief causes of seasonal variation? 

3. In what terms is a seasonal index expressed? Why? 

4. What are the strong and weak points of Monthly Medians of 
Original Data Method of computing a seasonal index? Under what cir- 
cumstances should it be used? 

5. Vhiat is the advantage of the Second Method over the First Method 
of computing seasonal variation? Is the difference large or small? Why? 

6. In what respects does the Third Method, or Deviation from Least 
Squares Trend Method, differ from the other four methods of computing 
seasonal variation? In what respects is it superior or inferior? 

7. What are the principal characteristics of the Link Relative Method 
of computing seasonal? In what respects is it undesirable? When should 
it be used? Does it eliminate cycle and trend? 

8. Compare the Moving Average Method of computing seasonal vari- 
ation with all the other methods as to (1) ease of computation, (2) accuracy, 
(3) reasons for accuracy. 

9. Compare Worksheets No. 65 and No. 66 as to (1) ease of computa- 
tion, and (2) accuracy. Under what conditions should each be used? 

10. Of what practical value are seasonal indices in business manage- 
ment? 


EXERCISES 


1. Standard and Poor's Index of Stock Prices of Seven Copper Com- 
panies, 1935”! 939 = (100). 


Year Jan. Feb. Mar. Apr. May June July Aug Sept Oct Nov Dec 


1935 42 0 40.1 36 9 42 2 54 6 48 

1936 87.5 101 8 103 3 112 2 101.0 104 

1937 167.8 180 2 188 7 163 4 153 2 153 

1938 103.2 101 9 95 4 83 6 85 3 94 

1939 114.1 97 4 105 8 81.2 83 6 85 

1940 101.1 100 1 98 2 99.7 93 1 74 

1941 91.3 82.0 82.9 82.4 86 8 89 


6 52 3 60 2 65 6 68 2 73.8 79 3 

2 114.4 120 1 120 8 139 1 159.1 161 8 

1 159 2 169 9 149 2 108.9 87 6 95 9 

2 105 8 107.3 106 4 122 6 125 4 112.4 

2 94 6 95.4 115.5 111 8 105 5 102.0 

2 70.7 72 9 82.9 83 7 93.9 92.1 

0 96 4 94 5 93.5 86.1 83 9 88.2 
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2. Gasoline Consumption in 1,000,000 Barrels. 

Year Jan Feb. Mar Apr. May June July Aug. Sept. Oct. Nov Dec. 

1935 28.1 26.4 32.0 36.1 39.1 37.9 41 2 42.8 37.9 41 4 36 0 33 7 

1936 32 3 27-2 35.9 38.9 42 0 44.5 46.7 47 0 44 4 44 2 40 0 38 6 

1937 33.7 32.0 40 5 43 5 45.4 48.5 51 1 49 6 47 5 45,4 42 7 39 5 

1938 38 2 31 9 41 3 43.3 44 9 48.3 47 5 50 5 46.0 46 3 45 0 41.7 

1939 37.8 34.6 42.5 44.0 49.6 49 8 50 6 53 8 49 4 49 7 47 3 43 7 

1940 40.4 37.6 44.6 47 7 52 9 55 5 53-9 55 9 52.3 53.8 49.1 46 4 


3. Newspaper Advertising for 52 Cities, Million Lines. 


Year 

Jan. 

Feb. 

Mar, 

Apr, 

May 

June 

July 

Aug 

Sept 

Oct 

Nov. 

Dec 

1937 

99.6 

103.1 

126 1 

131 1 

130.8 

121.3 

99.2 

103 7 

117 3 

135.0 

119.7 

122.3 

1938 

90.6 

88.5 

103 9 

109 0 

109 9 

98.5 

83 7 

86 1 

103.9 

113,6 

113 5 

118 1 

1939 

87.4 

86.7 

111 8 

111.2 

112 1 

105 1 

89.4 

90 5 

101 9 

119 6 

113 5 

118.1 

1940 

88.0 

93.2 

114 3 

112 0 

119 9 

103.3 

84.4 

92.0 

106.7 

118 8 

113 2 

122 8 

1941 

93.2 

94 0 

114.4 

119 2 

122.4 

108.4 

88.8 

95.7 

107.2 

123 8 

120 6 

125 5 



CHAPTER 18 


CYCLICAL FLUCTUATIONS 


A business cycle consists of those relatively irregular fluctua- 
tions of a business above and below its normal or long-time average 
activity. The higher portions of a cycle are ordinarily called 
prosperity and the lower portions depression. There is no such 
thing as the business cycle. There are actually hundreds or even 
thousands of separate cycles. Each industry has its own peculiar 
cycle. Each corporation or even each separate plant in every 
industry tends to vary somewhat in its cyclical movements from 
other plants and firms in the same industry because of different 
conditions of location, personnel, product, management, credit, 
and other factors. Plants or industries located in the large cities, 
on the seacoast, or dependent on export markets may have quite 
a different cycle from inland, domestic, or local firms or plants in 
the same field. The cycles of stock and bond prices, interest 
rates, money in circulation, retail prices, wholesale prices, raw 
material prices, finished goods prices, and other related factors 
may begin and end at different times, have different lengths and 
amplitudes, and affect the management of the business in very 
different ways. For purposes of practical management, therefore, 
the use of business cycle information requires a detailed study of 
the specific cycles occurring in a particular business or phase of 
business. What is sometimes called the business cycle is only a 
generalized or average picture of large or total business movements. 
It is a broad generalization based on the average of many separate 
cycles and as such has the limitations of all broad generalizations 
in that it obscures or eliminates many useful and necessary de- 
tails for any specific industry or particular business firm. Local 
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and regional or national conditions vary so widely that a par- 
ticular business may be very prosperous in California or Texas 
while it is depressed in Illinois or Massachusetts or vice versa. 
In the study and in the application of the measurements of 
business cycles these many and wide variations should be kept in 
mind. 


Fig. 83, Half a century of American business, showing variations in 
cycles. (Used with permission of the Cleveland Trust Company Busi- 
ness Bulletin) 


Business cycles for single plants or industries as well as the 
average or generalized cycles vary greatly in duration and ampli- 
tude. The cycles of short-lived consumers^ goods such as bread, 
cigarettes, shoes, etc., have small amplitude. The production 
and prices of such goods do not fluctuate widely. Raw materials 
and capital goods, on the other hand, such as wheat, copper, 
iron, cotton, etc,, have cycles of wide amplitude. The fluctua- 
tions of successive cycles in the same series vary greatly, ranging 
from ten to forty percent. The price of a loaf of bread of the same 
weight rarely fluctuates more than from seven to twelve cents a 
loaf. Wheat, on the other hand, varies from $0.25 to $2.25 a 
bushel or nine times the lowest value. Copper fluctuates from 
$0.06 to $0.25 a pound while copper wire and alarm clocks change 
in price but a small fraction of the price variation of the raw 
material. 

As indicated in Fig. 83, the length of cycles varies as much or 
more than their amplitude. The shortest cycles are about two 
years long while the longest ones are from seven to ten years in 
duration. Their average length is about three and one-half 
years. This wide variation in the length and amplitude of the 
cycle makes its measurement difficult and often inaccurate. Some- 
times a depression lasts but a few months while others endure for 
years. 
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RESIDUAL METHOD 

The traditional and still the most common method of meas- 
uring the cycle is to treat it as a residual. The trend and seasonal 
variation of a time series are removed by direct statistical attack 
through the use of averages and dispersions based on the original 
data. After these are taken out, the remainder is called the 
cycle. This method leaves a more or less impure cycle, containing 
all irregular movements and whatever of trend or seasonal change 
those computations failed to remove. This method is theoret- 
ically quite simple. It is like picking the beans and beads out of 
a box in which beans, beads, and buttons were originally mixed. 
The beans are picked out and put in one pile. That is like re- 
moving the trend. Then the beads are picked out and put into 
another pile. That is like taking out the seasonal variation. Then 
all that is left is called buttons, but it may include all the clasps, 
hooks and eyes, tacks, and dust and threads that have accumulated 
in the box over a long time. The residual method can never give 
a pure cycle. It is a conglomerate, a mixture including all move- 
ments except trend and seasonal variation, and if they have been 
incorrectly computed, some of their amplitude remains in the 
cyclical residual. This method is easy to compute and if care- 
fully done will give results that are sufficiently accurate for most 
practical purposes. 

The problem is somewhat like reducing a chemical compound. 
The elements must be separated, but it is a question of which one 
to eliminate first. The four major elements in a time series are 
Trend (T), Seasonal (/S), Cycle (C), and Irregular Movements (I). 
The compound which must be dissolved is, therefore, 

(TXSXCXI), 

if we consider the combined elements as bound together in a 
multiplicative or ratio structure. If their relationship is additive, 
the formula would be (T + ^S + C-hl). The ratio structure is 
frequently considered the correct one, but analyses are often 
made on the additive relation or on a combination of the two. The 
most commonly used formulas for the solution are: 
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1 . 

2 . 

3. 

4. 


(CXI) = 
(CX/) = 
(C X J) = 
(C + 1) = 


TXCXSXI 

TXS 

SXCXI .TXCXSXI 
^ ) and y 

TXCXI ^TXSXCXI 
y ! and g 

TjS + C + I) . 

m ^ 


^SxCxI 

^TXCXI 


The details of the first and fourth methods are illustrated in the 
worksheet on page 419. 

The first method of computing cycle in Worksheet No. 71 is 
based on the assumption that the relationships among trend, 
seasonal, cycle, and irregular changes are all multiplicative or 
ratio connections. This means that if trend (T) and seasonal (S) 
are increasing, the increase is not their sum but their product. 
It is not T + S but TS. It may be illustrated with arithmetic 
numbers; if T = 120, and S = 95%, then TS = 120 X .95 = 114. 
Since seasonal variation is computed as a percentage, this relation- 
ship is perfectly logical. If the original data are divided by T, 
on the assumption that the relationship is multiplicative, the re- 
sult is a ratio, or percentage. Suppose that — = 110%. If 

S = .95, then - 115.8 — 100 = 15.8. But if the relationship 
is considered to be additive, 110 — 95 = 15. In the first case the 
result is 15.8. By the second method it is 15. Which is correct? 
There is no positive or categorical answer. From what is known, 
the relationships seem to vary with different series. Sufficient 
research has not been done to prove which method is preferable 
in every case or even in many cases. The most that can be done 
is to evaluate the logical relationships which exist among trend, 
seasonal, and cycle in a given series and use the method that seems 
to meet the logical relationships in each particular series. If it 
seems that the cycle expands in proportion as the trend and sea- 
sonal rise, and contracts as the trend and seasonal fall, the ratio 
method should be used. But if there is no ratio increase in the 
cycle as the trend and cycle increase, their relation is only a 
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WORKSHEET NO. 71 

Two Alternative Methods of Computing the Cyclical-Irregular 
Movements in the Wholesale Price op Fresh Eggs 
IN Chicago, 1940 

Method No. 1 


Year 

and 

Month 

Original 

Data 

TXSXCXI 

Trend 

Values 

T 

Seasonal 

Index 

Normal 

Values 

TXS 

Cyclical-Irregular 

Percentages 

TXSXCXI 

TXS 

Cycle 

(CX/l-lOO 

1941 

Jan. 

18.4 

22.246 

103.8 

23.09 

79.7 

- 20.3 

Feb. 

16.7 

22.274 

84.4 

18.80 

88.8 

- 11.2 

Mar. 

17.8 

22.302 

87.0 

19.40 

91.7 

- 8.3 

Apr. 

21.6 

22.330 

87.4 

19.52 

110.7 

+ 10.7 

May 

22.3 

22.358 

83.4 

18.65 1 

119.6 

+ 19.6 

June 

25.4 

22.386 

79.4 

17.77 

142 9 

+ 42.9 

July 

26.1 

22.414 

89.4 

20.04 

130.2 

+ 30.2 

Aug. , 

27.7 

22.442 

97.3 

21.84 

126.8 

+ 26.8 

Sept. 

29.0 

22.470 

106.3 

23.89 

121.4 

+ 21.4 

Oct. 

31.0 

22.498 ! 

118.3 

26.62 

116.5 

+ 16.5 

Nov. 

36.0 

22.526 

136.3 

30.70 

117.3 

+ 17.3 

Dec. 

34.5 

22.554 

127.0 

28.64 

120.5 

+ 20.5 


Method No. 4 


Year 

and 

Month 

Original 

Data 

T(iS-fC+/) 

Trend 

Values 

Percent of Trend 
(8+C+J)- 

r(iS+c+j) 

Seasonal 

Index 

Percent 

Cyclical-Irregular 

Percentages 

1 

T 

S 


1941 

Jan. 

18.4 

22.246 

82.71 

103.8 

- 21.1 

Feb. 

16.7 

22.274 

74.97 

84,4 

- 9.4 

Mar. 

17.8 

22.302 

79.81 

87.0 

- 7.2 

Apr. 

21.6 

22.330 

96.73 

87.4 

1 + 9*3 

May 

22.3 

22.358 

99.74 

83.4 

■4- 16.3 

June 

25.4 

22.386 

113.46 

79.4 

1 4- 44.0 

July 

26.1 

22.414 

116.44 

89.4 

1 4- 27.0 

Aug. 

27.7 

22.442 

123.43 

97.3 

1 4- 26.1 

Sept. 

29.0 

22.470 

129.06 

106.3 

t -f- 22.7 

Oct. 

31.0 

22 498 

137.79 

118.3 

4- 19.5 

Nov. 

36.0 

22.526 

159.81 

136.3 

4 23.5 

Dec. 

34.5 

22.554 

152.97 

127.0 

4 25.9 
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matter of the sum of the parts. This means that T + S + C + I 
= Data, instead of TxSxCXI= Data. 

Method No. 1 in effect says, “This trend which we do have 
times this seasonal which we do have gives us a clear measure of 
the business in its normal activities. The product of TS divided 
into the data gives us the actual cyclical fluctuation.’^ The whole 
approach is realistic and positive. For this reason Method No. 1 
is usually to be preferred in computing the cycle. Method No. 4 
is based on the idea that the cycle does not vary in size with varia- 
tion in the amplitude of trend and seasonal. In other words, the 
cycle is only an added fluctuation which may be subtracted out. 
In some series this seems to be the case. It has the advantage of 
being less laborious to compute. It requires only six columns 
instead of seven. It substitutes one division and a simple sub- 
traction for either one multiphcation, one division and one sub- 
traction, or two divisions and one subtraction. It seems probable 
that in many cases the cycle does not expand in ratio to the 
amplitude of seasonal. In other words, its relation to seasonal is 
additive instead of multiplicative. Method No. 1 is used in Work- 
sheet No. 72 and Method No. 4 in Worksheet No. 73. 

WORKSHEET NO. 72* 


Complete Analysis of the Trend, Seasonal Variation, and Cycle 
OF THE Wholesale Price of Fresh Eggs, Chicago, 1936-1941 


1 

Year and 
Month 

2 

Original 

Data 

3 

Secular 

Trend 

4 

Seasonal 

Index 

5 

Normal 

3X4 

6 

Percentage 

Cycle 

2 4-5 

7 

Cycle 

Percent 

8 

Squared 

Percencage 

Cycle 

9 

Cycle 

cr 

1936 Jan. 

23 2 

20-566 

103 8 

21 35 

108 7 


4- 8.7 

75.69 

4- .49 

Feb. 

27 5 

20.594 

84 4 

17.38 

158 2 

4- 58.2 

3,387.24 

•+■3.29 

Mar. 

19 6 

20 622 

87 0 

17 94 

109.2 


4- 9.2 

84 64 

4- 52 

Apr. 

19 2 

20.650 

87.4 

18.05 

106 4 


4- 6,4 

40.96 

4- .36 

May 

20 2 

20 678 

83.4 

17 25 

117 1 


- 17.1 

292.41 

4- .97 

June 

21.0 

20.706 

79.4 

16.44 

127 7 


h 27.7 

767 29 

4- 1 56 

July 

21 4 

20 734 

89 4 

18 54 

115.4 

j 

- 15.4 

237.16 

4- .87 

Aug. 

22 6 

20 762 

97.3 

20 20 

111 9 

... 

h 11.9 

141 61 

4- .67 

Sept. 

24 8 

20 790 

106 3 

23 10 

112.2 

J 

h 12.2 

148 84 

4- .69 

Oct. 

27 4 

20.818 

118.3 

24 63 

111 2 

J 

L- 11 2 

125 44 

4- .63 

Nov. 

33 5 

20 846 

136.3 

28 41 

1179 

J 

- 17 9 

320 41 

4- 1 01 

Dec 

29 6 

20 874 

127-0 

26 50 

111 7 

H 

- 11 7 

136 89 

4- .66 

1937 Jan 

23 2 

20 903 

103 8 

21 70 

106 9 


4-6 9 

47 61 

4- 39 

Feb. 

21.7 

20 930 

84 4 

17 66 

122 9 

-1- 22 9 

524 41 

4- 1.29 

Mar. 

22.6 

20 958 

87 0 

18 23 

124 0 

4-24 0 

576 00 

4- 1-36 


21 8 

20 986 

87.4 

18 34 

118 9 

-f 18 9 

357 21 

4- 1 07 

May 

20.1 

21 014 

83 4 

17 53 

114 7 

+ 14.7 

216 09 

+- ,83 

June 

19 1 

21 042 

79.4 

16 71 

114.4 

4-14 4 

207 36 

■+■ .81 

July 

20.0 

21 070 

89 4 

18 84 

106.2 

4-6 2 

38 44 

+■ .35 

Aug. 

20.1 

21 098 

97 3 

20 53 

97 9 


-21 

4.41 

— .12 

Sept. 

22.2 

21 126 

106 3 

22 46 

98 8 


— 12 

1-44 

— .07 

Oct 

22.1 

21 154 

118 3 

25 03 

88 3 


11 7 

136 89 

— 66 

Nov. 

25 6 

21 182 

136 3 

28 87 

88-7 

— 

11 3 

127.69 

— .64 

Dec. 

24.3 

21210 

127.0 

26.94 

90.2 


-98 

96.04 

- .55 
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1938 Jan 

Feb 

Mar. 

Apr. 

May 

June 

July 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

20.9 

16 9 

17.4 
17.8 

19 5 

19 3 

20 3 

21 0 

23 1 

25 3 

27 3 

25.4 

21 238 
21.266 

21 294 

21 322 

21 350 

21 378 

21 406 

21 434 

21 462 

21 490 

21 518 

21 546 

103 8 

84 4 

87 0 

87 4 

83.4 

79.4 

89 4 

97 3 

106 3 
118 3 
136 3 
127 0 

22 05 

17 95 

18 53 

18 64 

17 81 
16.97 

19 14 

20 86 

22 81 
25.42 
29.33 
27,36 

94.8 

94 1 

93 9 

95 5 
109 5 
113 7 
106 1 
100 7 
1013 

99 5 

93 1 

92 8 


- 5.2 

- 5.9 

- 6.1 

- 4.5 
+ 9.5 
f 13 7 
+ 6.1 
+ 0.7 
+ 13 
-05 
-69 

- 7.2 

27.04 
34 81 
37 21 
20.25 
90 25 
187 69 
37.21 
.49 

1 69 
25 
47 61 

51 84 

H 

1 

- .29 

- 33 

- .34 

- 25 
h 54 
h 77 
h 34 
h 04 
h 07 

- 03 

- 39 

- 41 

1939 Jan 

18 1 

21 574 

103 8 

22.39 

80 8 


-19 2 

368 64 

ZT 

1 08 

Feb 

16 5 

21 602 

84 4 

18.23 

90 5 


-95 

90 25 


- 54 

Mar. 

16 6 

21 630 

87.0 

18.82 

88 2 


- 11 8 

139 24 

- 

- 66 

Apr 

16 4 

21 658 

87 4 

18 93 

86 6 


- 13 4 

179 56 

- 

- 76 

May 

15 8 

21 686 

83 4 

18-09 

87 3 


- 12 7 

161 29 


- 72 

June 

15 3 

21 714 

79 4 

17.24 

88 7 


-12 3 

151 29 


- 69 

July 

15 4 

21 742 

89.4 

19.44 

79 2 


-20 8 

432 64 



1 17 

Aug. 

15 5 

21 770 

97 3 

21.18 

73 2 


-26 8 

718 24 



1 51 

Sept 

18 2 

21 798 

106 3 

23.17 

78 5 


- 21 5 

462 25 



1 21 

Oct. 

20 1 

21 826 

118 3 

25.82 

77.8 


-22 2 

492 84 



1 25 

Nov. 

23.6 

21 854 

136 3 

29.79 

79 3 


-20 7 

428 49 



1 17 

Dec. 

19 1 

21 882 

127 0 

27 79 

68 7 


-313 

979 69 

— 

1 77 

1940 Jan 

20 8 

21 910 

103 8 

22 74 

91 5 


-85 

72 25 

I 

- 48 

Feb. 

21 3 

21 938 

84 4 

18 52 

115 0 

+ 15 0 

225 00 

-+ .85 

Mar. 

16 4 

21 966 

87 0 

19 11 

85 8 


-14 2 

201 64 


- .80 

Apr. 

16 4 

21 994 

87 4 

19 22 

85 3 


-14 7 

216 09 


- S3 

May 

16 5 

22 022 

83 4 

18 37 

89 8 


-10 2 

104 04 

— 

- 58 

June 

15 6 

22 050 

79.4 

17 51 

89 1 


-10 9 

118 81 

— 

- .62 

July 

15 8 

22 078 

89 4 

19 74 

80 0 


-20 0 

400 00 



1.13 

Aug 

16 3 

22.106 

97 3 

21 51 

75 8 


-24 2 

585 64 


1.37 

Sept 

19 3 

22 134 

106.3 

23 53 

82.0 

-18 0 

324 00 



1.03 

Oct 

20 3 

22 162 

118.3 

26 22 

77 4 


-22 6 

510 76 


1 28 

Nov. 

23 6 

22 190 

136.3 

30 24 

78 0 


-22 0 

484 00 



1 24 

Dec 

25 2 

22 218 

127 0 

28 22 

89 3 


-10 7 

114 49 

- 

- 60 

1941 Jan 

18 4 

22 246 

103 8 

23 09 

79 7 


-20 3 

412 09 

_ 

1 15 

Feb 

16 7 

22 274 

84 4 

18 80 

88 8 


-112 

125 44 


- 64 

Mar 

17 8 

22 302 

87 0 

19 40 

91 7 


-82 

67 24 

— 

- 46 

Apr 

21 6 

22 330 

87.4 

19 52 

110 6 


- 10 6 

112 36 

+ 60 

May 

22 3 

22 358 

83.4 

18 65 

119 6 


- 19 6 

384 16 

+ 

1 11 

June 

25 4 

22 386 

79 4 

17 77 

142 9 

- 

- 42 9 

1,840 41 

+ 

2 42 

July 

26.1 

22 414 

89 4 

20 04 

130 2 

J 

- 30 2 

912 04 

+ 1 71 

Aug, 

27 7 

22 442 

97 3 

21 84 

126 8 

- 

- 26 8 

718 24 

+ 1 51 

Sept. 

29.0 

22 470 

106 3 

23 89 

121.4 


- 21 4 

457 96 

+ 1 21 

Oct 

31.0 

22 498 

118 3 

26 62 

116 5 

- 

■- 16 5 

272 25 

+ .93 

Nov. 

36.0 

22 526 

136 3 

30 70 

117.3 


- 17 3 

299 29 

4 

■ 98 

Dec 

34 5 

22 554 

127 0 

28 64 

120,5 

- 

h 20.5 

420 25 

+ 1 16 


22,611 38 


, = Vf = = Vili^ = IT 7 

* Five or six years is iisual'y too short a period to give the best results in the analysis of time 
series by the methods used In Worksheets No 73 and No 74 Ten or more years would usually give 
better results The only reason for reducing the time to the shorter period in this case is to economize 
space The shorter period is sufficient to make the methods sufficiently clear to the student 

We began our time series analysis to discover the trend, sea- 
sonal variation, and cycle of the original data. Worksheet No. 72 
completes this task. In column 2 are the original data by 
months. In column 3 is the monthly trend. In column 4 is the 
seasonal index. In column 7 is the cycle in percentage deviations 
from normal. The analysis is complete at the end of column 7 
for most purposes. The reason for computing column 9 is to ex- 
press the cycle in terms of the standard deviation of the per- 
centage figures. The two cycles in columns 7 and 9 are identical 
in all particulars, shape, quantities, and monthly signs, except 
that the first is in percentages of normal and the second is in 
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WORKSHEET NO. 73 


Complete Analysis of the Teend, Seasonal Vabiation, and Cycle 
OF Automobile Peoduction in U.S., 1937-1941 


1 

Year 

and 

Month 

2 

Original 

Data 

(1,000) 

3 

Secular 

Trend 

(1,000) 

4 

Actual as 
Percentage 
of Trend 

5 

Seasonal 

Index 

(3d Method) 

6 

Cycle in 
Percentage 

1937 Jan. 

309.5 

273.35 

113.2 

124.3 

- 10.1 

Feb. 

296 5 

273.48 

108 5 

112 0 

- 3.5 

Mar. 

403 9 

273 61 

147.6 

125 8 

+ 218 

Apr. 

440.0 

273.74 

160.7 

122.5 

+ 38.2 

May 

426 4 

273 87 

155 3 

108.4 

+ 46 9 

June 

411.4 

274 00 

150.1 

102 5 

+ 47 6 

July 

360.4 

274.13 

131 5 

61.4 

+ 701 

Aug. 

3115 

274.26 

113.6 

23.2 

+ 90 4 

Sept. 

118.7 

274.39 

43.3 

63.1 

- 19 8 

Oct. 

298 7 

274 52 

108 8 

104.8 

+ 40 

Nov. 

295.2 

274.65 

107.5 

117 0 

- 90 

Dec. 

244.4 

274.78 

88.9 

135.0 

- 46.1 

1938 Jan. 

155.5 

274.91 

56 6 

124.3 

- 67 7 

Feb. 

139.4 

275.04 

50.7 

112.0 

- 613 

Mar. 

174.1 

275 17 

63.3 

125.8 

- 62.5 

Apr 

176.1 

275.30 

64.0 

122.5 

-58.5 

May 

155.0 

275.43 

56.3 

108.4 

- 52.1 

June 

136.5 

275.56 

49.5 

102.5 

-53 0 

July 

106.8 

275.69 

38.7 

61.4 

- 22.7 

Aug. 

58.6 

275.82 

212 

23.2 

-2.0 

Sept. 

65.2 

275.95 

23.6 

63 1 

- 39 5 

Oct. 

187.5 

276.08 

67.9 

104 8 

- 36.9 

Nov. 

320.3 

276.21 

116 0 

117 0 

- 1.0 

Dec. 

326.0 

276.34 

118.0 

135.0 

- 17.0 

1939 Jan. 

281.5 

276.47 

101.8 

124 3 

- 22.5 

Feb. 

243.0 

276.60 

87.9 

112.0 

- 24.1 

Mar. 

299.7 

276.73 

108.3 

125.8 

- 17.5 

Apr 

273 4 

276.86 

98.8 

122.5 

- 23.7 

May 

237 9 

276.99 

85.9 

108.4 

- 22 5 

June 

246.7 

277.12 

89.0 

102.5 

- 13.5 

July 

150.7 

277.25 

54.4 

614 

-70 

Aug. 

61.4 

277.38 

22.1 

23.2 

- 1.1 

Sept. 

161 6 

277 51 

58.2 

63.1 

- 49 

Oct. 

2518 

277.64 

90 7 

104 8 

- 14 1 

Nov. 

285.3 

277.77 

102.7 

117.0 

- 14.3 

Dec. 

373.8 

277 90 

134.5 

135.0 

- 5 

1940 Jan. 

363.0 

278 03 

130.6 

124 3 

+ 63 

Feb. 

337 8 

278.16 

121 4 

112.0 

+ 94 

Mar, 

352.9 

278 29 

126.8 

125 8 

+ 10 

Apr. 

362 7 

278.42 

130 2 

122.5 

+ 77 

May 

325 7 

278 55 

116.9 

108 4 

+ 85 

June 

286.0 

278 68 

102 6 

102 5 

+ 1 

July 

168.8 

278.81 

60.5 

61 4 

- .9 

Aug. 

46 8 

278 94 

16.8 

23 2 

- 64 

Sept. 

224 5 

279 07 

80 5 

63.1 

+ 17 4 

Oct. 

4213 

279.20 

150 9 

104.8 

+ 46 1 

Nov. 

4071 

279.33 

145 7 

117.0 

+ 28 7 

Dec. 

396.5 

279.46 

1419 

135 0 

+ 69 

1941 Jan. 

411.2 

279 59 

1471 

124 3 

+ 22 8 

Feb. 

394 5 

279.72 

1410 

112.0 

+ 29 0 

Mar. 

410 2 

279 85 

146 6 

125.8 

+ 20.8 

Apr. 

375.0 

279 98 

133 9 

122 5 

+ 11.4 

May 

417 7 

280.11 

147.1 

108 4 

+ 38 7 

June 

419 0 

280 24 

149 5 

102 5 

+ 47 0 

July 

343 7 

280 37 

122.6 

614 

+ 61 2 

Aug. 

78.5 

280 50 

28.0 

23 2 

+ 4.8 

Sept. 

167.8 

280 63 

59 8 

63 1 

- 33 

Oct 

295 6 

280 76 

105 3 

104 8 

+ .5 

Nov. 

266 1 

280 89 

94.7 

117 0 

+ 22 3 

Dec 

175 0 

281.02 

63.3 

135 0 

-717 
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standard deviations of those percentages. The purpose of com- 
puting the last cycle is to make it more readily comparable with 
other cycles. Two or more cycles stated in percentages are not 
easily compared because of the wide range of the percentages in 
one cycle as compared with another. For instance, the percent- 
age variations of the cycle of bread prices might possibly range 
from 10% above normal to 15% below normal, while wheat price 
cycles would certainly range from 100% or more above normal to 
60 to 80% below normal. The wheat cycle would be so deep and 
the bread cycle so shallow that close comparisons between the 
two would be difficult. It would be somewhat like comparing a 
pig and an elephant. Although these two animals are much 
alike in shape, they differ so much in size that most persons might 
see little similarity between them. When, however, two or more 
cycles are reduced to their own standard deviations, they tend 
to have about the same amplitude. This makes comparison for 
forecasting purposes easier. 

The methods followed in Worksheet No. 73 for securing trend, 
seasonal variation, and cycle are the shortest and easiest available 
and give fairly good results. The trend is from Worksheet No. 52 
(p. 379). The seasonal variation is from Worksheet No. 63 (p. 401), 
Since the seasonal index in the Third Method is based on the per- 
centage figures of column 4 in Worksheet No. 73 there is almost 
no additional labor required to compute the seasonal variation by 
this method. Since the cycle is assumed to have an additive rather 
than a multiplicative relation to normal, it is also obtained by the 
least possible labor. The entire worksheet reduces to six columns 
instead of the seven required for the ratio or multiplicative rela- 
tionship. By means of the three Worksheets, No. 52, No. 63, 
and No. 73 time series analysis is made relatively short and 
simple. 

The cycle of the prices of fresh eggs, Chicago, 1936-1941, in 
terms of the standard deviation is shown in Fig. 84. 

The cycle of automobile production, 1937-1941, is shown in 
terms of percentage deviations in Fig. 85. By expanding or exag- 
gerating the 2/-scale it is possible to make the cyclical movements 
from month to month stand out in bold relief. 


Cycle in 
Standard 
Deviations 



Fig. 84. Cycle of price of fresh eggs, Chicago, 1936-1941. (From 
Column 9 in Worksheet No. 72) 



Fig. 85. Cyclical fluctuations of automobile production, 
1937-1941. (From Column 6 in Worksheet No. 73) 
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DIRECT METHOD OF COMPUTING CYCLE 

As the previous sections of this chapter indicate, the Residual 
Method of computing cycles leaves much to be desired. Since 
all errors in computing trend and seasonal variation under the 
residual method affect and distort the cycle, it is preferable to be 
able to make a direct statistical analysis of the cycle by means of 
averages and dispersions based on the original data. This direct 
measurement of the cycle has been developed by Wesley C. 
Mitchell and Arthur F. Burns and was first published in Bulle- 
tin 57, July 1, 1935, by the N ational Bureau of Economic Research. 
Dr. Simon Kuznets and others aided in perfecting the techniques, 
theory, and computations. 

It is not our purpose in this elementary text to reproduce all of 
the many complicated tables and methods which appear in 
Bulletin 57 and the subsequent publications of Mitchell, Burns 
and Kuznets, but only to introduce the student to the basic and 
essential points in this analysis as compared with the other 
method. Students who intend to do creative research in this 
field should certainly study Mitchell’s works in detail. 

Essential Steps 

The essential steps in the direct analysis of business cycles are: 

A. Securing the Cycles 

1. Run a 12-month moving average through the original data 
to eliminate seasonal variation. See Worksheet No. 74. 

2. Plot the de-seasonalized data on coordinate graph paper as 
in Fig. 86. 

3. Divide the plotted data line into specific cycles by marking 
the low points and the high points as indicated in Fig. 86, by the x’s. 

B. Measuring Length of Cycles 

4. For each cycle, record in a comparative table the date of 
(1) Revival, (2) Peak, and (3) Trough as indicated in Worksheet 
No. 75. 

5. For each cycle in the worksheet the length of (4) the period 
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Automobile 
Production 
in Thousands 



Fig. 86. Data for automobile production in U.S., 1914-1932, corrected 
for seasonal variation by centered 12‘-month moving average, with cycles 
marked. (From Worksheet No, 74) 


of expansion, and (5) the length of the period of contraction in 
months is recorded, together with (6) the total length of each cycle. 

6. The periods (7) expansion, and (8) contraction are stated as 
percentages of the total length of the cycle in months. 

7. The average of each of the columns indicated in steps No. 5 
and No. 6 and the average deviation of the items from the mean 
are computed. 

The computations in steps Nos. 4-7 give us the average length 
of the cycles and their deviation from that average. 


C. Measuring the Amplitude of Cycles 

8. In Worksheet No. 76 the amplitude of the cycles is measured 
by considering the average of the original de-seasonalized data 
for each cycle as 100. 

9. With the base in No. 8 as 100, compute the percentage level 
of (2) the initial trough; (3) the peak; (4) the terminal trough; 
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(5) the amount of rise from (2) to (3) ; (6) the amount of fall from 
(3) to (4) ; and (7) the total amount of rise and fall for the entire 
cycle, together with the amount of change per month for the 
rise, the fall and the total rise and fall. This computation gives a 
double measure of the amplitude of each cycle (1) by totals, 
(2) by months, 

10. The average is computed for each of the points in No. 9, 
and the average deviation of the cycles from their mean. 

D. Computing Secular Trend of Cycles 

11. Since the direct analysis of business cycles is based on the 
original data, it necessarily includes the secular trend, which 
must be removed to obtain a pure cycle. This correction is made 
by computing the percent of change in trend from one cycle 
to the next (1) from the preceding phase of the same cycle and 
(2) from the same phase of the preceding cycle. This is shown in 
Worksheet No. 77. 

E. Cycle Patterns, or Contours 

12. Each cycle is divided at nine points and the amount of 
change between each pair of points measured. This reveals the 

WORKSHEET NO. 74 

Data of Automobile Production in U.S., 1914-1932, in 1,000, 
Corrected for Seasonal Variation by Centered 
12-Month Moving Average 


Years 

Jan. 

Feb. 

Mar. 

Apr. 

May 

June 

July 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

1914 

41 

43 

45 

47 

47 

45 

44 

43 

46 

48 

51 

54 

1915 

48 

59 

63 

74 

70 

75 

81 

87 

91 

96 

102 

106 

1916 

110 

112 

115 

118 

122 

125 

128 

130 

131 

132 

138 

134 

1917 

136 

138 

139 

142 

143 

140 

137 

134 

130 

127 

123 

122 

1918 

117 

112 

102 

92 

81 

79 

79 

79 

80 

81 

83 

87 

1919 

91 

98 

107 

118 

131 

138 

145 

150 

156 

156 

158 

162 

1920 

166 

168 

169 

166 

162 

159 

150 

141 

133 

132 

129 

127 

1921 

126 

125 

124 

124 

122 

120 

123 

128 

131 

137 

142 

149 

1922 

153 

160 

163 

170 

179 

190 

202 

214 

229 

240 

250 

2o7 

1923 

263 

269 

279 

289 

296 

302 

307 

314 

316 

316 

310 

300 

1924 

295 

290 

287 

281 

274 

266 

260 

252 

250 

254 

262 

274 

1925 

283 

281 

281 

292 

303 

311 

317 

322 

328 

328 

329 

327 

1926 

324 

338 

345 

336 

327 

315 

309 

299 

295 

292 

291 

287 

1927 

280 

271 

261 

252 

242 

329 

240 

248 

250 

251 

252 

258 

1928 

267 

277 

288 

301 

311 

318 

330 

339 

351 

366 

377 

386 

1929 

393 

396 

397 

396 

387 

377 

368 

357 

341 

327 

315 

301 

1930 

284 

mmM 

247 

229 

229 

231 

223 

215 

208 

200 

193 

187 

1931 

208 

206 


196 

191 

189 

186 

185 

173 

160 

151 

141 

1932 

109 

102 

99 

97 

96 

95 

96 

96 

96 

99 

100 

110 
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cycle contour as to whether it rises rapidly and falls slowly or 
rises slowly and falls rapidly or is irregular in movement. This 
contour is measured in Worksheet No. 78. 

13. Several other characteristics of the cycles are measurable 
by the direct method, such as lead or lag with the general busi- 
ness cycle, month-to-month trend, and other details. 


WORKSHEET NO. 75 

Length of Specific Cycles op Automobile Pkoduction 
IN U.S., 1914-1932 


Dates ot Specific Cycles 

Revival Peak Trough 

12 3 

Expan- 

sion 

Months 

4 

Duration 

Contrac- 

tion 

Months 

5 

ot Cyclical Movements 

Full Cvcle Percentages 

Months 

siOH txon 

6 7 8 

1. Aug , 1914, May, 1917, July, 1918 

34 

14 

48 

71 

29 

2 Aug , 1918, Mar , 1920, June, 1921 

20 

15 

35 

57 

4L 

3 July, 1921, Sept., 1923, Sept., 1924 

27 

12 

39 

69 

31 

4 Oct., 1924, Mar , 1926, June, 1927 

IS 

15 

33 

55 

45 

5. July, 1927, Mar , 1929, June, 1932 

21 

39 

60 

35 

65 

Arithmetic Mean 

24 

19 

43 

57 

43 

Mean Deviation 

5 

8 

9 

10 

10 

LENGTH 

OF SPECIFIC 

CYCLES 




After the monthly de-seasonalized data in Worksheet No. 74 
are plotted on Fig. 86, the lowest and highest points in each 
cycle are marked with an X and a heavy vertical line is drawn 
through the lowest points to distinguish the separate cycles. The 
revival month and year, the peak month and year, and the ter- 
minal trough month and year of each cycle are written in columns 
1, 2 and 3 of Worksheet No. 75 and the number of months in the 
expansion leg of the cycle, or from the initial revival to peak are 
written in column 4, the number of months in the contraction leg, 
or from peak to trough, are written in column 5, and the total 
length of each cycle in months is written in column 6. With the 
total length of the cycle in months as 100%, the expansion and 
the contraction periods are reduced to percentages of the total 
duration of the cycle. The mean and mean deviation are com- 
puted for each of the five columns. For the five cycles of auto- 
mobile production the mean expansion period is 24 months with 





AMPLITUDE OF SPECIFIC CYCLES 


429 


a mean deviation of 5 months. The mean contraction period is 
19 months with a mean deviation of 8 months. The average 
length of the five cycles is 43 months with a mean deviation of 
9 months. Fifty-seven percent of the average automobile cycle is 
expansion while 43 percent is contraction. The mean deviation 
is 10 percent. Although a set of means based on a larger number 
of cycles would give more dependable results, this direct statis- 
tical attack on the automobile production cycle gives us quite a 
clear picture of what happens in this industry in the variations 
from prosperity to depression. Automobile production cycles 
vary from 33 to 60 months and average about three and one-half 
years in length. 


WORKSHEET NO. 76 

Amplitude of Specific Cycles of Automobile Production 
IN U.S., 1914-1932 


Dates of Specific Cycles 

I 

Revival Peak Tropgli 

At Initial 
Trough 

2 

Standing ^ 

At At Terminal 

Peak Trough 

3 4 

Total Movement 

_ ^ Rise and 

Rise Fall 

5 6 7 

1. Aug , 1914, May, 1917, July, 1918 

42 

137 

77 

95 

60 

155 

2 Aug , 1918, Mar , 1920, June, 1921 

61 

130 

94 

69 

36 

105 

3. July, 1921, Sept . 1923, Sept., 1924 

53 

133 

107 

80 

26 

106 

4 Oct , 1924, Mar , 1926, June, 1927 

86 

114 

82 

28 

32 

60 

5 July, 1927, Mar., 1929, June, 1932 

95 

155 

38 

60 

117 

177 

Arithmetic Mean 

67 

134 

80 

66 

54 

120 

Mean Deviation 

18 

10 

18 

18 

27 

36 


AMPLITUDE OF SPECIFIC CYCLES 

The amplitude of cycles is based on the same principle as sea- 
sonal variation. The average of the entire cycle is considered 
100% and the low and high points are stated as percentages of 
this average. The average monthly automobile production from 
August, 1914, to July, 1918, inclusive, as given in Worksheet 
No. 74 is taken as 100%. The average production of the three 
months centered on August, 1914, or July, August, and Septem- 
ber, is taken as the initial trough and is expressed as a percentage 
of the cycle average. In this case the figure is 42 in Worksheet 
No. 76. The value for the peak is the average centered on May, 
1917, or the average production of April, May, and June, 1917. 
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This figure is expressed as a percentage of the cycle average. In 
this case it is 137. The value of the terminal trough is the monthly 
average of the three months centered on July, 1918, or June, 
July, and August, 1918, expressed as a percentage of the average 
of the cycle. In this case the figure is 77. The figures for the 
other cycles are obtained by the same method. The average of 
each cycle is the base of 100 for that cycle. 


SECULAR MOVEMENT IN CYCLES 

Since the direct statistical attack on business cycles begins 
with the original data which necessarily include whatever trend 
there is in the business, a part of the direct analysis is the meas- 
urement of the trend. The theory underlying this method is 
that if there were no trend, that is, if the trend were perfectly 
level, the average difference between the level of cycles would be 
zero. On the average they would all be on the same level. If, 
however, there is an upward trend, each succeeding cycle will 
tend to be on a higher level. If the trend is downward, each suc- 
ceeding cycle will tend to be on a lower level. This change in 
trend from cycle to cycle may be measured in several ways. Two 
methods are introduced here. The first method is to measure the 
change in the average standing of each succeeding phase of each 
cycle as related to the next preceding phase. This measurement 
is made in columns 5 and 6 in Worksheet No. 77. 

The Average Monthly Standing of the cycles in columns 2, 3, 
and 4 in Worksheet No. 77 is obtained as follows: 

1. The original de-seasonalized values of the data in Worksheet 
No. 74 are averaged for each expansion period and each contrac- 
tion period of each cycle. The 34 months from August, 1914, to 
May, 1917, are averaged for the first expansion period of the 
first cycle. This is 100, which is written in column 2 in Worksheet 
No. 77. The data values for the contraction period, June, 1917, 
to July, 1918, or 14 months, are averaged and are 113, which is 
written in column 3 in Worksheet No. 77. The average for the 
total cycle is 103 and is written in column 4. 

2. The Average Monthly Standing values for all the other 
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cycles are obtained by the same method. To compute these values 
the cycle dates and lengths in columns 1, 2 and 3 of Worksheet 
No. 75 and the monthly data in Worksheet No. 74 are necessary. 

The + 13 in column 6 of Worksheet No. 77 means that the 
average height of the cycle in its contraction period, which is 
113, is 13% above the preceding expansion level in column 2. 

The + 16 in column 5, second line, means that the expansion 
period of the second cycle is 16% above the contraction level of 
the first cycle. The level rose from 113 to 131 which is 16% up. 

This second cycle fell from 131 to 130 or — 1% in its second 
phase. The next cycle, however, in its expansion period rose to 
217 which is 67% over the 130 of the second phase of the preceding 
cycle. All the other figures in columns 5 and 6 are obtained by 
the same methods. 

Since the average of the four expansion periods is up 20% and 
the contraction periods on the average are do^vn only — 1%, the 
analysis shows that there is an upward trend of approximately 
20% per cycle. 

Columns 7, 8, and 9 measure the trend for each phase of each 
cycle as related to the same phase of the preceding cycle. The 
expansion period of the second cycle is 31% above the expansion 
period of the first cycle. The contraction period of the second 
cycle is 15% above the contraction period of the first cycle. The 
average upward trend from expansion phase to the next expan- 
sion phase is + 28%. For the contraction phase it is + 25%. 
For the cycles as a whole the upward trend averages 24%. This 
rapid upward trend is clearly indicated by Fig. 87. 

This direct measurement of the trend from cycle to cycle does 
not remove the trend. It merely measures it. 

The last characteristic of the cycle which we shall measure in 
this direct analysis is the Cycle Pattern or Contour as indicated 
in Worksheet No. 78. 

1. The three principal measuring points of each cycle, (1) the 
initial trough, (2) the peak, and (3) the terminal trough as indi- 
cated in columns 2, 3, and 4, of Worksheet No. 76 are written in 
columns 2, 6, and 10 of Worksheet No. 78. (This method is 
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slightly different from that used by Mitchell, but is easier for the 
beginning student and is sufficiently accurate.) 

2. The expansion phase of each cycle is divided into four parts. 
In cycle one the height of the cycle is measured every 8 months. 
In cycle two the height of the cycle is measured each 5 months, 
and so on with the expansion phase of each cycle. The height of 
the cycle at the end of the first quarter of the expansion phase is 

written in column 3 of Work- 
sheet No. 78. The height of 
the cycle at the middle of the 
expansion phase is written in 
column 4, and the value at 
the end of the third quarter 
in column 5. 

3. The contraction is also 
divided into four quarters. 
For the first cycle with a con- 
traction period of 14 months, 
the measure of height is 
taken every three and one- 
half months. For the second 
cycle with a 15-month con- 
traction period, the level is 
measured every 4 months. 
The value for the first quar- 
ter is written in column 7, 
the second quarter in column 

8 and the third quarter in column 9. 

4. This method splits each entire cycle into eight time seg- 
ments with a measure of cycle level for each segment. From these 
measures the contour of the cycle can be plotted as in Fig. 87. 

Automobile production cycles rise most rapidly during the 
middle half of the expansion period and fall most rapidly during 
the first half of the contraction period. Their variation from 
average is more marked at the beginning and end of the cycle 
and narrower during the middle two-thirds of the cyclers duration. 

The details of the methods presented above differ in some minor 



Fig. 87. Average automobile produc- 
tion cycle and deviations. (From 
Worksheet No. 78) 
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points from Mitcheirs analysis, but in the main follow the prin- 
ciples of his direct statistical attack on the cycles. In Worksheet 
No. 78 we have measured the height of the cycle at the end of 
each quarter of expansion and at the end of each quarter of con- 
traction, while Mitchell measures it at the middle of the thirds. 
We have also left out the measurement of Reference Cycles and 
lead and lag as matters which can be studied more profitably by 
students who are specializing in this field and are taking advanced 
courses in business cycles. Our purpose is to show the basic 
principles of the direct analysis of the cycle as compared with 
the older and more widely used residual method. 

SUMMARY 

1. Cyclical fluctuations in business and social activities are usually 
not uniform in either duration or amplitude. 

2. There may be two or more series of cycles of varying length and 
amplitude running concurrently. 

3. Cycles may be measured by the residual method which consists of 
computing out the portions of the total time series activity which are 
due to secular trend and seasonal variation and then calling all that re- 
mains cyclical fluctuation. This method is often quite inaccurate because 
it includes with the cycle all the accidental changes besides those por- 
tions of the trend and seasonal change which may not have been removed. 

3. The direct method attacks the cycle directly with the prime statistical 
devices of averages, deviations, ratios, and percentages, just as the seasonal 
variation and secular are attacked by direct methods. 

4. The relationship between cycles and seasonal variation may be either 
an additive or multiplicative relationship. 

5. The relationship between cycles and secular trend may be either an 
additive or multiplicative relationship. 

6. The similarity of the fluctuations of two or more cycles may be 
measured by computing the coefficient of correlation between them for 
several different amounts of lag. First, the identical months of the two 
cycles may be correlated. Second, one cycle may be dropped back one 
month, then two months, and so on for as long a lag as is desired. The 
length of the lag that gives the maximum correlation is the one that 
shows the full similarity of the cycles. 
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REVIEW QUESTIONS 

1. Differentiate a business cycle from irregular and accidental varia- 
tions. 

2. What is meant by ^Hhe business cycle” and “a business cycle”? 
How many business cycles are there? Explain in detail. 

3. What types of cycles fluctuate most widely? less widely? Why? 

4. What is the ^Yesidual method” of business cycle analysis? What 
are its strong points and its weak points? 

5. Are the relationships among trend, seasonal variation, and cycle 
additive or multiplicative? What is meant by additive and multiplicative 
in relation to business cycles? 

6. What are the differences between the two methods of computing 
the cycle in Worksheet No. 71? What are the advantages and disadvan- 
tages of each? 

7. What are the reasons for thinking that business cycles expand 
and contract in proportion to the steepness of the trend? 

8. What is meant by the phrase, “direct analysis of the business 
cycle”? 

9. Name and explain the three steps in securing the cycles for the 
direct analysis. 

10. Why is a 12-month moving average run through the data before 
the cycles are measured? 
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11. How is the duration or length of the cycles measured in the direct 
method? Explain fully. 

12. How is the amplitude of the cycles measured in the direct method? 

13. How is the secular trend of cycles measured in the direct method? 

14. How is the Pattern, or Contour, of business cycles measured in 
the direct method? Explain fully. 

15. What are the advantages and disadvantages of the direct method 
over the residual method of measuring business cycles? Explain fully. 

EXERCISES 

1. Compute the business cycle by the residual method for the follow- 
ing data: 

1. Price of hogs on the Chicago Market 

2. Production of pig iron 

3. Exports from the United States 

4. Freight car loadings 

5. Petroleum production 

The above data may be found in the Survey of Current Business, 
Standard and Poores, and in Moody^s. 

2. Compute the measurement of the business cycle for each of the 
following series of data by the direct method, using at least three com- 
plete cycles: 

1 . Production of pig iron 

2. Production of bituminous coal 

3. Freight car loadings 

4. Factory employment 

5. Total imports into the United States 

The above data may be found in the Survey of Current Business, 
Standard and Poores, and in Moody^s. 



CHAPTER 19 

INDEX NUMBERS 


Index numbers are a specialized type of average. They measure 
the central tendency of a group of time series or spatial series." 
It is evident that one cannot judge the direction of the general 
movement of prices by one price series. For instance, the price 
of strawberries may rise at the same time that the prices of other 
fruits are falling. The change in strawberry prices may be caused 
by the weather, which may be unfavorable, or by the seasonal 
variation in prices. Likewise, the price of one particular stock 
or bond may fall while most other securities show rising prices. 
To measure the general movement of prices an average of several 
time series is necessary. 

In measuring the volume of production the same principle 
holds. Fewer buttons may be produced at the same time that 
other production is rising. The decrease in buttons may be 
caused by an increase in zippers and not by depression. Prices 
and production may decline in some areas and increase in other 
localities or markets. In any case, a correct measure of the level 
of either price or production for either time periods or geographic 
subdivisions or areas can be obtained only by averaging the prices 
or quantities of a representative sample of the population to be 
measured. The principles of sampling, probability, error, averages, 
and variation apply to all index numbers. 

It is a matter of common knowledge that economic activity is 
not uniform. Periods of prosperity and depression, of low prices 
and high prices, of idle factories and full employment chase each 
other through the years like sunshine and cloudy weather. Prices 
and production and employment and profits and rentals and 

438 



PRICE AND PRODUCTION RELATIVES 


439 


interest rates rarely remain at the same level for more than a few 
months at a time. Often they change from week to week. It is 
evident that an informed and intelligent management of business 
requires accurate and up-to-date information on these economic 
changes. Index numbers are the answer to a large part of this 
need. In fact, it is probable that the average person consumes 
or uses more index numbers than any other form of statistics. 
The business sections of all newspapers and all trade journals 
are full of them. They are the signs and guideposts along the 
business highway that indicate to the businessman how he should 
drive, or manage, his affairs. 

PRICE AND PRODUCTION RELATIVES 

The data of a single series when reduced to percentages are 
usually referred to as relatives. By contrast the average of two 
or more series is called an index. The production of wheat by 
years for any area, when the production figures are changed to 
percentages would be wheat production relatives. When the 
prices of wheat per month or year are changed to percentages 
they are called wheat price relatives. Some statisticians do not 
make this distinction between relatives and index numbers. One 
series cannot be averaged; several series can be combined only 
by an average. One series is much less likely to be a good measure 


WORKSHEET NO. 79 


Year 

Wheat Production 
in U.S. in Bushels 

Wheat Production 
Relatives 

1926 

833,544,000 

100 

1927 

874,733,000 

105 

1928 

912,961,000 

109 

1929 

822,180,000 

99 

1930 

889,702,000 

107 

1931 

932,221,000 

112 

1932 

745,788,000 

89 

1933 

528,975,000 

63 

1934 

496,929,000 

60 

1935 

603,199,000 

72 
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of an economic activity or field than a fair sample of series. Be- 
cause of these permanent and fundamental differences, we wish, 
to distinguish clearly between a relative based on one series and 
an index based on several series. We shall reserve the term index 
number for the averaged series only. 

The usefulness of reducing large production figures to simple 
percentages is quite evident. The ordinary person finds it very 
difficult to compare accurately and quickly figures running into 
hundreds of millions. The percentage figures are small and easy 
to understand and to compare. 

WORKSHEET NO. 80 


Wheat Price Relatives, Chicago 


Year 

Wheat Prices 
per Bushel 

Wheat Price 
Relatives 

1926 

$1.40 

100 

1927 

1.38 

99 

1928 

1.17 

84 

1929 

1.30 

93 

1930 

.84 

60 

1931 

.53 

38 

1932 

.53 

38 

1933 

.94 

68 

1934 

1.02 

73 

1935 

1.01 

73 


In computing production relatives or price relatives, one period 
or area is selected as a base equal to 100, and the production or 
price of that period is divided into the prices or production of the 
other periods or areas. The results are the percentages which w’-e 
call relatives. This method is quite excellent for comparing 
changes in two or more series which differ a great deal in size, 
since it reduces their variations to a comparable base. 


PROBLEMS m THE CONSTRUCTION OF INDEX NUMBERS 

The statistician must make seven decisions in the construction 
of an index number. They are: 
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1. Deciding what to measure; 

2. Determining the periods or areas to be measured; 

3. Selecting the series for the index; 

4. Selecting the base; 

5. Determining the type of average; 

6. Selecting the weights; 

7. Testing the reliability of the index. 

These decisions must usually be made in the order in which 
the}^ are here listed. However, they are not entirely independent 
of each other and a decision on one of them necessarily affects 
some or all of the others. The making of an accurate index num- 
ber, therefore, requires a careful preliminary study of the field 
to be measured in all its phases. 

Deciding What to Measure 

Most economic activities are so varied and complex that it is 
very difficult or even impossible to measure them as a whole with 
any large degree of accuracy or meaning. Our analysis of trend, 
seasonal variation, and cycle illustrates this point. If we are to 
make an index that is dependable, should we eliminate trend and 
measure the combined seasonal and cycle, or shall we also elimi- 
nate seasonal and create an index measuring only the cycle? We 
may do either or both. If we do not eliminate the trend, we may 
find our index a mixture of opposite or contradictory trends. If 
we do not eliminate seasonal variation, we may find that our data 
are selected in such a way that they overemphasize the month-to- 
month changes of business. The question, shall we measure trend, 
seasonal variation, cycle, or two of them or all, remains and must 
be decided before we can begin to build our index. Index num- 
bers may be made for any specific market, field of production, type 
of employment, or other business activity, but in any case the 
exact field that is to be measured must be predetermined. In 
other words, we must have a definite and delimited goal for our 
index. 

Determining Periods or Areas to he Measured 
Index numbers may be made by decades, such as 1800, 1810, 
1820, etc., or by years, as 1930, 1931, 1932, etc., or by months or 
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weeks, or even days, or any other convenient time period. They 
may be made by states, counties, cities, townships, precincts, 
city blocks, or by nations or continents, or any other convenient 
geographical subdivision or area. In any case, the time period or 
area to be measured must be decided upon as a preliminary to 
collecting the data. 

Index numbers for agricultural production are usually made by 
years; those for farm and industrial prices are made by months; 
industrial production and employment are usually measured by 
months although there are some weekly indexes in these fields; 
stock and bond quotations are given by days, weeks, and months. 
Population changes are often measured by decades. 

Selecting the Series for the Index 

After deciding upon the particular phase of business or other ac- 
tivity which is to be measured and the time periods or areas at or in 
which the measurement is to be made, the selection of the data is 
the next step. The data chosen necessarily depend on the pre- 
liminary decisions. It may be that sufficient data are not available 
to make the type of index desired. It may be desired to make a 
weekly index, but most of the data may be in monthly figures. 
Or a monthly index may be needed but the data are available only 
in quarterly statements. Such necessities may compel an altera- 
tion of the index. 

,, Index numbers are usually based on purposive sampling instead 
of pure random selection. The sample should be of sufficient 
size to give dependable results. Since an index number is a 
specialized type of measure, a controlled stratified sample (see 
Chapter 14) is usually better. The appropriateness of includ- 
ing any particular series in an index may be checked by cor- 
relating it with other known characteristics of the variable to be 
measured. Professor C. E. Lively and R. B. Almack of Ohio 
State University in their study, A Method of Determining Rural 
Social Suh- Areas vnth Applications to Ohio, Part I and Part JJ,^ 
employ and illustrate these methods. 


^ Ohio State University, Columbus, Ohio. 
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Selecting the Base 

The base period or area of an index is the unit that is chosen to 
equal 100%. The values of the base period or area are divided into 
the values of the other periods or areas. The base value may be 
the price or quantity of a single time period, day, month, year, etc., 
or of a single unit area, as a county, state, city, etc., or it may be 
the average of several time periods or of many geographic sub- 
divisions. In any case, it is better to have a base that is as near a 
middle or median value as possible. The highest or lowest value in 
the series is less desirable as a base than a middle one. In agri- 
cultural production an average of three to five or even more years 
is better than a single year because of the erratic fluctuation of 
farm production caused by irregular weather changes such as 
droughts, floods, frosts, and the like. In measuring population 
changes by counties in a state or region, the county mean of the 
state or region is a better base than the value for any one county. 
The same principle applies to all spatial indexes. 

Determining the Type of Average 

Any kind of average may be used in making index numbers, but 
the arithmetic mean and the geometric mean are more appropri- 
ate and more widely used than any other measures. The mode 
may be dismissed as too erratic. It will be recalled that the median 
is usually employed in indexes of seasonal variation. It may also 
be used for other types of indexes but with less reason than in time 
series in which one erratic month might seriously modify the entire 
index. If a sample of sufficient size of relatively homogeneous data 
can be obtained, the arithmetic mean is a dependable method for 
production and quantity indexes. The geometric mean is better 
for price changes and percentages. (See Chapter 9 on Averages.) 

There are hundreds of formulas for computing index numbers. 
A few of these cover all important possibilities and are used almost 
universally. The others are only unimportant variations of the 
basic relations. The most widely used types of index numbers are 
the aggregative and the average of price relatives. Our computa- 
tions are limited to these methods 




444 


INDEX NUMBERS 


Selecting the Weights 

All index numbers are weighted. If no conscious effort is made to 
assign to each series a special weight, they are all weighted equally 
with a weight of one. Such weighting, or lack of intelligent 
weighting, usually gives an inaccurate or unrepresentative index. 
The purpose of weighting is to make the index truly representative 
of the population it is to measure. A good illustration of the need 
of such weighting would be an index of food costs which placed 
meat, vegetables, salt, and pepper on an equal basis. In such a 
case if the price of pepper rose $0.10 a can while bacon fell $0.02 
a pound, the index would show a rise in the cost of living, while 
actually there would be a decline because one would use many 
pounds of bacon for one can of pepper. Such an index would be 
incorrectly weighted. An aggregative index should he weighted by 
the quantity used, bought, sold, or produced. Quantity weights are 
usually easy to obtain and make an aggregative index fit the 
economic facts. 

An average of price relatives index should be weighted with values. 
In computing price relatives the relative size of the values is re- 
moved and all are reduced to the common base of 100%. To make 
such an index realistic, these value differentials must be restored 
by including them in the weights. These two points the student 
must remember, (1) Aggregative indexes have quantity weights; 
(2) Price relative indexes have value weights. You will recall from 
your economics that Value = Price X Quantity; V = PQ. 

INDEX NUMBER FORMULAS 

At this point under the discussion of methods of averaging and 
weighting indexes, it is necessary to introduce the basic formulas. 

Meaning of symbols: 

I = Index V = value 

po = price of base year 

Pi = price of given year (any year except the base year) 
go = quantity of base year 
qi == quantity of given year 
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Foemula No. 68 

r = 

^ Spo’ 

for an unweighted aggregative index of prices. 


Formula No. 69 

j. ^ 

2pogo 

for a weighted aggregative index of prices. 

Formula No. 70 



for an unweighted average of price relatives. 


7 = 


Formula No. 71 


27 

for a weighted average of price relatives weighted with values. 


Foemula No. 72 


Log/ = 


2 


2pigi 


for the weighted geometric average of price relatives index, irsing 
logarithms. 


Foemula No. 73 


j _ S(gipo) 

2(goPo)’ 

for quantity indexes. Quantity is weighted by price. 
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TESTING THE RELIABILITY OF INDEX NUMBERS 

Like all other statistics, or computations from samples, the 
question arises as to how dependable or representative the index 
is of the population it is supposed to measure. In Fig. 88 is 
shown the dispersion of the price relatives of ten active stocks on 



Oct. Nov. Dec. 

Fig. 88. Percentage changes in the prices of ten active stocks 
on the New York Stock Exchange, 1933. (See Worksheet 
No. 87) 

the New York Stock Exchange. In the base period, October, all 
of the relatives are, of course, 100. In November all except two 
decline. Three of the ten fall from 17 to 24 percent. Five of 
them remain within five percent of each other. The sample does 
not show very uniform change. In the month of December the 
changes are still more divergent. Four of the stocks rise, one of 
\hem by more than 20%. Two remain constant and another 
almost level. Five rise by a small percent and only one falls. 
This sample which was taken at random and includes Industrials, 
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Utilities, and Railroads, is too small to indicate clearly the move- 
ment of so large a section of the market. If three indexes based on 
three larger samples from each of the three sections of the mar- 
ket had been taken, the results would have been much more de- 
pendable. An index number with fewer than from twenty to 
forty series in it is likely to have a considerable error. 

In Fig. 89 the same relationships are shown for the price rela- 
tives of nine Oklahoma farm crops. The homogeneity of the 


Percentages 



Fig. 89. Percentage changes in prices of nine agricultural 
commodities in Oklahoma. (See Worksheet No. 86) 


sample and the uniformity of the price movements are clearly 
indicated. All of these prices fall from 1930 to 1931 and six out 
of nine remain within 10% of each other. One falls slightly more 
and two considerably less than the others. In 1932 five of the 
series are within seven percent of each other, and the total spread 
of prices has decreased from 37% in 1931 to 28% in 1932. Not 
only do these prices move in the same direction but most of them 
change by about the same amount. 

In the case of the change in stock prices, the standard error of 
the mean is 2.4 (92.1 ± 2.4), for November and 4.0 (98.2 4.0), 

for December. For the Oklahoma farm prices the standard error 
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of the mean is 4.2 (58.7 =fc 4.2), for 1931 and 3.3 (43.6 ± 3.3), 
for 1932. The reliability of a smaller index may be checked by 
correlating it with a much larger index, the dependability of 
which is definitely known. 


UNWEIGHTED AGGREGATIVE INDEXES 

The aggregative index is made by adding the values for each 
time period, selecting a base, and dividing the base aggregate into 
the totals for each of the other time periods. In the so-called 
unw^eighted aggregative, each item has a weight of one regardless 
of its importance. The greatest defect of this index is that com- 
modities sold in large units, such as hundredweights or tons, have 
a large dollar value per unit, and, therefore, exert an altogether 
disproportionate weight in the index. Cottonseed sold by the 
ton and cotton lint sold by the pound are excellent examples of 
this inequality. 

WORKSHEET NO. 81 


Unweighted Aggkegative Index of Oklahoma Farm Prices 


Items 

Prices 

Prices 

Prices 

1930 

, 1931 

1932 

Wheat, bushel 

$ 0.71 

$ 0.36 

$ 0.31 

Corn, bushel 

0.74 

0.40 

0.22 

Oats, bushel 

0.34 

0.20 

0.13 

Barley, bushel 

0.52 

0.28 

0.19 

Cotton Lint, pound 

0,097 

0.057 

0.056 

Cottonseed, ton 

22.08 

11.36 

8.36 

Potatoes, bushel 

1.31 

0.57 

0.53 

Sweet Potatoes, bushel 

1.19 

0.95 

0.61 

Hay, ton 

7.87 

6.20 

4.53 

Totals 

$34,857 

120.367 

$14,936 

Index 

100.0 

58.7 

42.8 


The index in Worksheet No. 81 is completely dominated b}^ 
cottonseed and hay because these two commodities are sold in 
large units, or tons. Cotton lint, which is actually of much more 
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importance than either cottonseed or hay, has almost no weight 
in the index, because it is sold by the pound. If cotton lint were 
sold by the ton, even at 5 cents a pound, it would be worth $100.00 
a ton and in turn would dominate the index. Efforts have been 
made to overcome this inequality of weight by reducing all com- 
modity prices to the same unit size. Dun and Bradstreet reduced 
all prices to ^^per pound prices. This is unsatisfactory. It 
leads to the ridiculous result of selling hay, coal, silk, and diamonds 
by the pound. An unweighted aggregate of commodity prices 
cannot be made a good index. 

WORKSHEET NO. 82 


Unweighted Aggregative Index of Active Corporation Stocks 
ON THE New York Stock Exchange for October, 
November, and December, 1933 


Ten Stocks 

October 

November 

December 

Am. Can 

93 

90 

100 

Am. Car & Fdy. 

29 

24 

24 

Am. Pow. & Light 

9 

7 

7 

Am. Smelt. & R. 

45 

45 

44 

Am, Tel. & Tel. 

120 

115 

119 

Am. Tobacco 

86 

74 

75 

Ches. & Ohio 

42 

40 

41 

Chrysler Co. 

44 

41 

53 

Gen. Electric 

20 

20 

21 

Penn. R.R. 

29 

27 

31 

Totals 

Index 

517 

100.0 

483 

93.4 

515 

99.6 


An unweighted aggregative of stock prices is a better index 
than an aggregate of commodity prices, because stocks and bonds 
are sold in more uniform units, but even here the index needs in- 
telligent weighting to be accurate. 

The weights used in the Weighted Aggregative of Oklahoma 
Farm Prices are the average quantities produced in Oklahoma 
during the five years 1928-1932 inclusive, as given in the United 
States Y earhook of Agriculture. 
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WORKSHEET NO. 83 


Weighted Aggeegative Index of Oklahoma Farm Prices 


Items 

Price 

1930 

wt. 

Wtd. 

Price 

Price 

1931 

Wt. 

Wtd. 

Price 

j 

Price 

1932 

Wt. 

Wtd. 

Price 

Wheat, bu 

$ 71 

66 1 

$39 1 

$ 35 

55.1 

$19 3 

$ 31 

55 1 

$17.1 

Corn, bu 

.74 

83 7 

619 

.40 

83.7 

33 5 

22 

83 7 

18 4 

Oats, bu. 

.34 

26 7 

9 1 

20 

26.7 

53 

13 

26 7 

3 5 

Barley, bu 

.52 

1 4 

7 

.28 

1.4 

4 

19 

1.4 

3 

Cotton Lint, lb 

097 

554 5 

53 8 

.075 

554.5 

316 

056 

554 5 

31 1 

Cottonseed, ton 

22 08 

5 

110 

1136 

,5 

5.7 

8 36 

5 

42 

Potatoes, bu 

131 

33 

43 

.57 

33 

1.9 

53 

33 

17 

Sweet Potatoes, bu. 

1 19 

14 

17 

95 

1 4 

13 

61 

14 

9 

Hay, ton 

7 87 

1 1 

87 

6.20 

11 

6.8 

4 53 

11 

50 

Totals 

Index 

$190 3 
100.0 

$105 8 
55 6 

$82.2 
43 2 


Source: Agricultural Yearbook, 1936. 


The actual average quantities are: 
Wheat, bushels 

55,145,000 

Corn, bushels 

83,667,000 

Oats, bushels 

26,711,000 

Barley, bushels 

1,354,000 

Cotton Lint, pounds 

554,500,000 

Cottonseed, tons 

493,000 

Potatoes, bushels 

3,272,000 

Sweet Potatoes, bushels 

1,376,000 

Hay, tons 

1,078,000 


The actual quantity weights may be used in their full size as 
given above or as relative figures. To use the actual figures in 
millions would make the numbers unduly large for most calcu- 
lating machines. The same results can be obtained by cutting 
off five places and using the reduced numbers as shown in Work- 
sheet No. 83. For instance, 55.1 means 55,100,000 bushels; 
83.7 means 83,667,000 bushels; etc. The same results can be ob- 
tained by reducing the original quantities to percentages of the 
total of all the quantities. This weighted index is somewhat 
different from the unweighted one, and is theoretically much more 
logical, and actually much more dependable. The weighted 
aggregative method of making Index numbers is sound mathe- 
matically and is widely used in both price and quantity indexes. 
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Quantity weights in thousands of shares of stocks sold. 
Source: Commercial and Financial Chronicle, 



452 


INDEX NUMBERS 


AVERAGE OF PRICE RELATIVES 

A second method of computing index numbers is the average 
of price relatives. By this method the actual data of each series 
are changed to price relatives by dividing all the prices of a series 
by a base period price. These price relatives of the several series 
are then averaged to make the index. 

WORKSHEET NO. 85 

Unweighted Arithmetic Average of Price Relatives 
Index of Oklahoma Farm Income 


Items 

1930 

-r, • Price 

Relative 

1931 

■D . Price 

Relative 

1932 

-TV . Price 

Relative 

Wheat, bu. 

1 .71 

100 

$ .35 

49.3 

$ .31 

43.7 

Corn, bu. 

.74 

100 

.40 

54.1 

.22 

29.7 

Oats, bu. 

.34 

100 

.20 

58.8 

.13 

38.2 

Barley, bu. 

.52 

100 

.28 

53.8 

.19 

36.5 

Cotton Lint, lb. 

.097 

100 

.057 

58.8 

.056 

57.7 

Cottonseed, ton 

22.08 

100 

! 11.36 

51.4 

8.36 

37.9 

Potatoes, bu. 

1.31 

100 

.57 

43.5 

.53 

40.4 

Sweet Potatoes, bu. 

1.19 

100 

.95 

79.8 

.61 

52.2 

Hay, ton 

7.87 

100 

6.20 

78.8 

4.53 

57.5 

Totals 

Index 

900 

100 

528.3 

58.7 

392.8 

43.6 


Source: Agricultural Yearbook, 1936. 

In the average of price relatives method a base period is selected 
and the data for each item are changed to price relatives with the 
value of the base period as 100. The relatives for each time period 
are totaled and these totals divided by the number of items. This 
unweighted index is not identical with the unweighted aggrega- 
tive. In using the relatives of the data the absolute difference in 
the size of the units in which prices are quoted is eliminated. Al- 
though the unit of price for cotton lint is pounds and the unit 
for cottonseed is tons, and one price is in cents per pound and the 
other price is in dollars per ton, the relatives for the base year are 
identical, 100 for each item. The sizes of the relatives for the 
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other time periods are determined only by the relative amount of 
change in the price of each item between the base period and the 
given period and not by the size of the price unit. While this de- 
vice avoids one of the faults of the unweighted aggregative index, 
it still does not give a good index unless the relatives are weighted 
properly. Only a logically or scientifically weighted index can 
give an accurate measure of the population it is designed to 
represent. 

WORKSHEET NO. 86 

XJXWEIGHTEI) AvEEAGE OF PeICE RELATIVES INDEX OP ACTIVE 
CoRPOEATioN Stocks on the New Yoek Stock Exchange 
FOR October, November, and December, 1933* 


Ten Stocks 

October 

Price Relative 

November 

Price Relative 

December 

Price Relative 

Am. Can 

93 

100 

90 

96.8 

100 

107.5 

Am. Car & Fdy. 

29 

100 

24 

82.7 

24 

82.7 

Am. Pow. <fe Light 

9 

100 

7 

77.8 

7 

77.8 

Am. Smelt. & R. 

45 

100 

45 

100.0 

44 

97.8 

Am. Tel. & Tel. 

120 

100 

115 

95.8 

119 

99.2 

Am. Tobacco 

86 

100 

74 

86.0 

75 

87.2 

Ches. & Ohio 

42 

100 

40 

95.2 

41 

97.6 

Chrysler Co. 

! 44 

100 

41 

93.2 

53 

120.5 

Gen. Electric 

20 

100 

20 

100.0 

21 

105.0 

Penn. R.R. 

29 

100 

27 

93.1 

31 

106.9 

Totals 

Index 

1,000 

100.0 

920.6 

92.1 

982.2 

98.2 


* Source: Commercial and Financial Chronicle. 


October is selected as the base to equal 100, and the relatives 
for the other months are obtained by dividing the October prices 
into the prices of the other months. 

Value weights are used in a weighted average of price relatives 
index. Value is price times quantity. The quantities used may be 
those of the base period or of some other period. They may be an 
average of quantities extending over several periods. In agri- 
cultural production it is especially necessary to use the average 
quantity of the crop produced over a period of years, because of 





454 



THWi-HcDqc<)cooqcoeo 
i> i> lo id r4 05 00 c4 c4 
OC 0 '?^^OC 005 iO 00 T }^^0 

lO THl>r-l(M 00 C 0 CN 


i01>-0000(MC<JC0i0O05 


qc<ji-Hqu 2 c<ic<iooqi> 

cdi>>didido 5 c 6 o 6 cD 05 

lOCGTlHlOt^OOiOCDtMrH 
^ tH q r-l CN 1-H CO (N 

eq rH rjT 


coj>oqqoqqc<i(NqrH 

cDc<ii>Oidoxdcdocd 

c:J500I>0050005(05005 


.1-- CMi0C0i005OC0C0C0c0 OO 

t^Tt<iOiO'^C<ICOb-<MCO OO 

■^ 13 ® THWcqtM-^^cocM q_<-H 

^ p:^ ■•+3 r-T r^T o' 


(Mi000>00500c00c0 
t^Ttjqq-rtjwqt^qeo 
’ 'THl>C<io 4 THc 6 c 4 

(N 1-1 TtH 


oooooooooo 

oooooooooo 


X! . 

igo- 

O O Ph M Eh Eh . f 

• • • • * rt? ^ 

s s g s a 

< 1 -< <! <1 ^ <5 o O C 


455 



456 


INDEX NUMBERS 


the fluctuations of annual yields. In the above index, the quan- 
tity used is the average of five years, the period 1928-1932 in- 
clusive. The value weights are the products of these average 
yields times the base year price. The value weights may be used 
as absolute figures in dollars. 

Often a preferable method is to reduce the absolute dollar 
values to percentages as is done in Worksheet No. 87. This 
greatly facilitates the calculations if the index covers a large num- 
ber of items and time periods. 

When the weights and prices of the base year of an aggregative 
index are used as value weights in computing a weighted average 
of price relatives index, the two indexes are identical. Both are 
widely used in the construction of business indexes and both 
give fairly good results. 

WORKSHEET NO. 89 

Weights for Weighted Average of Price Relatives Index of 
Active Corporation Stock on New York Stock Exchange 


Ten Stocks 

Value in Base Period 
Actual 

Value in Base Period 
Percentages of Total 

Am. Can 

$ 1,822,800 

4.72 

Am. Car & Fdy. 

174,000 

.45 

Am. Pow. & Light 

222,300 

.58 

Am. Smelt. & R. 

8,325,000 

21.55 

Am. Tel. & Tel. 

6,756,000 

17.49 

Am. Tobacco 

851,400 

2.20 

Ches. & Ohio 

1,029,000 

2.66 

Chrysler Co. 

17,278,800 

44.73 

Gen. Electric 

1,260,000 

3.26 

Penn. R.R. 

913,500 

2.36 


138,632,800 

100.00 


The value weights to be used in Worksheet No. 88 are the 
weighted prices (3d column under October) in Worksheet No, 84. 
The weighted price (actual value) of American Can sold in Octo- 
ber was $1,822,800. This is too large a figure to use in Worksheet 
No. 88 or to use on the calculating machine. It is better to reduce 
it to the percentage figure 4.72. The total equals 100%. 
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AVERAGE OF PRICE RELATIVES 
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Source of data, U.S. Agricultural Yearbook. 
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Use of Geometric Mean in Making Index Numbers 

The only difference between the methods of the arithmetic 
mean and the geometric mean is that in using the geometric mean, 
instead of averaging the price relatives, one weights and averages 
the logarithms of the price relatives. This method reduces the 
effect of the larger relatives in the result. Since a price may never 
fall more than 100% but may rise many thousand percent, if a 
few of the prices included in the index have risen several hundred 
percent, the arithmetic mean will give these few very high prices 
such a large effect in the index that it will be too high to give a 
true picture of the general price movement. The geometric mean 
minimizes the effect of the few very high prices so as to give a 
more accurate measure of the general movement. Review the 
Geometric Mean in Chapter 9. 

Changing Basis of Index Numbers 

The base of any index number may be shifted from the original 
base period to any other period for which the index has been cal- 
culated. This change of base is computed by dividing the value 
for each period in the completed index by the value of the period 
which is desired as the new base. The process may be illustrated 
with the values of the geometric average of price relatives as 
computed in Worksheet No. 91 below. 


WORKSHEET NO. 91 


Year 

Present Index 
Base 1930 

New Index 
Base 1931 

New Index 

Base 1932 

1930 

100.0 

180.9 

240.4 

1931 

55.3 

100.0 

125.7 

1932 

41.6 

75.2 

100. 


This is a very useful device in the comparison and manipulation 
of index numbers. It often occurs that one wishes to compare 
two or more index numbers which have been originally computed 
with different bases. One index may have 1913 as a base, another 
may have 1926 and another 1930 or some other period. By the 
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method indicated above, all these indexes may be shifted to the 
same base, say 1936. This shift makes possible their comparison 
without recalculating them from the beginning. 

QUANTITY INDEX 

If one wished to measure fluctuations in the mineral production 
of California, the period to be covered should be decided, a sam- 
ple of mineral production series selected and the data averaged. 
Although the sample is small, the method is illustrated in Work- 
sheet No. 92. It is decided to measure the fluctuations in Cal- 
ifornia’s mineral production in 1930, 1935, and 1940. The pro- 
duction of copper, gold, silver, and petroleum is selected for 
the sample. A larger sample would be preferable, but these four 
items illustrate the principle just as well. The units are for 
copper 1,000 lbs., for gold and silver 1,000 of ozs., and for oil 
1,000 bbls. At once the student faces the problem of how to add 
such diverse units as pounds, ounces, and barrels. Some common 
denominator must be found. Value or dollars, or quantity of 
worth, is the most available measure. Price, or quantity in 

WORKSHEET NO. 92 


Computation of Mineral Production Index for California, 
1930, 1935, 1940 


Mineral 

1930 

1 1935 

1940 


Produc- 

Price 

Value 

Produc- 

Price 

Value 

Produc- 

Price 

Value 


tion 

Weights 

tion 

Weights 

tion 

Weights 

Copper 

570,897 

8 .09 

$ 51,381 

278,519 

$ .09, 

$ 25,067 

574,533 

$ .09 

$ 51,708 

(1,000 lbs ) 
Gold 

450 

35.00 

15,750 

870 

35 00 

30,450 

35.00 

1,444 

50,540 

(1,000 ozs ) 
Silver 

1,434 

0.775 

1,111 

1,065 

0 775 

825 

0.775 

2,225 

1,724 

(1,000 ozs.) 
Oil 

(1,000 bbls ) 

227,392 

140 

318,260 

207,832 

1.40 

290,965 

1 40 

223,881 

313,433 

Totals 



$386,502 



$347,307 



$417,405 

Index 



100.0 



89.9 



108.0 


(Source, Statistical AUtract of U.S. 1941. The depression year of 1935 fell to 89 9 from a base 
of 100 0 In 1930, but the war year of 1940 rose to 108.0 Since the price weights are held constant, 
this index measures Quantity production variations The variations of the several items are not 
uniform Copper and silver production fall a great deal from 1930 to 1935, but gold rises and oil 
falls only a little The average of all of them is a better indicator of production change for the state 
than any single series.) 
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dollars, weights each commodity according to its relative value in 
the market. These price weights may be chosen as of some par- 
ticular time or may be the average of a period of time. In this 
case they are for the year of 1940, The quantity of each mineral 
produced in each period is multiplied by the price weight, which 
must be uniform for all periods, and the values (QP) are totaled. 
The total value for the base period, which is 1930 in this case, is 
divided into the total values for all the periods and expressed as 
percentages. 


SOCIO-ECONOMIC INDEXES 

Frequently it is desired to measure variations in the level of 
living or other socio-economic factors of a number of areas or 
groups. Worksheet No. 93 illustrates a widely used method for 
making such indexes. It is desired to measure the socio-economic 
levels of the farmers of Rhode Island by counties. Five series of 
data are selected for the sample; (1) number of farm automobiles, 
(2) number of farm motorcycles, (3) number of tractors on farms, 
(4) number of farm homes lighted with electricity and (5) number 
of farm telephones. Other series no doubt should be included to 
make an adequate index, but these five farm conveniences will 
illustrate the principle just as well. All five of these items con- 
tribute to the productive power, pleasure, or satisfaction of farm 
families, and therefore, tend to measure their socio-economic 
status. 

At once the student faces the difficulty of variation in the size 
of counties and of variation in the amount of farm lands per 
county. If the index is to measure accurately the level of living 
of farmers in counties of various areas and numbers of farms such 
data as these must be reduced to some satisfactory common de- 
nominator. A common denominator widely used in such cases is 
per-capita figures, or the population of the area. Since in this 
case we are dealing with farms, urban population must be ex- 
cluded and only farm population used. Such a common base of 
comparison would be adequate in this case. 

Another and perhaps a better measure would be the average 
number of each article or service per farm. These figures are 
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shown in the right-hand column under each county in Worksheet 
No. 93j and are obtained by dividing the original data for each 
series for each county by the number of farms in that county. 
These ratios are summed and their county average for the state 
used as a base. 

2.510 + 2.841 + 3.100 + 2.711 + 2.901 = 14.063. 

14.063 ^ 5 = 2.8126 = county average for the state. This is 
used as the base. 

Bristol County has the lowest level and Newport County the 
highest. Providence County is slightly below the state average 
and Kent and Washington Counties slightly above the average. 

WORKSHEET NO. 93 

Computation o^' Socio-Economic Index fob Faemers 
IN Rhode Island, 1940 


Counties 


Accommodations | 

Bristol 

I Kent 

Newport 

Providence 

Washington 


Total 

No. 

Total 

No 

Total 

No. 

Total 

No. 

Total 

No. 


No. 

per 

No. 

per 

No. 

per 

No 

per 

No. 

per 



Farm 

i 

Farm 


Farm 


Farm 


Farm 

Automobile 

96 

466 

342 

.753 

343 

.727 

942 

703 

416 

766 

Motorcycles 

84 

408 

233 

513 

252 

.534 

600 

.448 

284 

.523 

Tractors 

40 

.194 

123 

.271 

183 

388 

338 

.252 

186 

.343 

Dwellings Electri- 











cally Lighted 

182 

.893 

387 

.852 

415 

.879 

1,114 

.832 

413 

.761 

Telepbones 

113 

.549 

205 

.452 

270 

.572 

683 

476 

276 

.508 

No of Farms 

206 


454 


472 


1,339 


543 


Totals 


2.510 


2.841 


3 100 


2 711 


2 901 

Index 


89 3 


101 0 


110 2 


96 4 


103 1 


Source: XJ.8. Census of Agriculture, Vol. 1, Part 1, Table X, 1940 


TESTS FOR INDEX NUMBERS 

Three rather complicated tests have been devised to check the 
dependability of index numbers. They are (1) the time re- 
versal test, (2) the factor reversal test, and (3) the circular test. 
There is some criticism of the adequacy and value of these tests. 
They are not necessary to the elementary student^s understand- 
ing of index numbers and are therefore omitted from this text. 
The student who may wish to investigate them will find adequate 
treatments in the references at the end of this chapter. 
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SUMMARY 

1. Index numbers are specialized types of averages. 

2. The making of index numbers requires seven decisions as follows : 

a. What to measure. 

b. The periods or areas to be measured. 

c. The series of data to be included in the index. 

d. Selecting the base. 

e. Determining the type of average to be used. 

f. Selecting the weights for the several series. 

g. Testing the reliability of the index. 

3. Index numbers may be computed to measure: 

a. Changes from one time period to another. 

b. Changes from one area to another. 

4. The base period or area should be an average or normal period or 
area. The mean of the periods covered or the mean of the areas covered 
may be used as the base. 

5. In time series indexes it is preferable to have the base as near the 
present time as is possible to obtain a normal or average value. 

6. Aggregative indexes are weighted by quantities. 

7. Averages of price relatives indexes are weighted by values. Value is 
price time quantity. 

8. The base of an index number may be changed or switched to an- 
other period or area by dividing the index through by the value of the 
particular period or area which one wishes to serve as the new base. 

9. Index numbers are usually not based on general random sampling 
but on stratified purposive sampling. 
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STUDY QUESTIONS 

1. Distinguisli between index numbers and relatives. What reasons 
are there for this distinction? 

2. What problems must be decided before an index number can be 
made? 

3. What are the chief problems in deciding what to measure? Ex- 
plain in detail. 

4. What are the difficulties in deciding on the time units for which 
measurements shall be made? 

5. What rules should be followed in selecting the data for an index 
number? Explain fully. 

6. What period or area should be chosen as the base period or base 
area? Explain fully. 

7. What are the relative advantages of the various types of averages 
as applied to index numbers? Which would you advise using and why? 

8. What is meant by weighting an index number? Can an index 
number be unweighted? Why? 

9. For what areas or periods should weights be chosen? Why? 

10. What kind of weights does an aggregative index require? Why? 

11. What kind of w'cights does a weighted average of price relatives 
require? Why? 

12. Whiat kind of weights does a quantity or volume or production 
index require? Explain fully. 

13. How may the dependability of an index number be tested? Ex- 
plain in detail. 

14. What is an aggregative index? 

15. What is an average of relatives index? 

16. How may the base period of a completed index number be shifted 
to another period or area? 

17. Why is this shifting of index bases useful? 

18. What is meant by the statement that the average person con- 
sumes more index numbers than any other kind of statistics? 
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EXERCISE IN INDEX NUMBERS 

Average Prices op Building Materials^ 

Year- 

Yellow Pine 
(per 1000 ft.) 

White Lead 
(per 100 lbs.) 

Iron and 
Steel 
(per ton) 

Cement 
(per bbl.) 

Brick 
(per 1000) 

1924 

% 99.40 

$14.80 

$40.87 

$1.84 

$14.40 

1925 

106.30 

15.63 

38.86 

1.79 

14.01 

1926 

95.40 

15.21 

38.32 

1.74 

13.91 

1927 

88.30 

14.03 

36.38 

1.68 

14.02 

1928 

86.00 

13.25 

35.50 

1.67 

13.72 

1929 

89.80 

13.74 

36.48 

1.60 

13.62 

1930 

91.70 

13.91 

33.52 

1.60 

13.05 


Actual Quantities 1926 

Weights Yellow Pine 
White Lead 
Iron & Steel 
Cement 
Brick 

^ Source: Standard Statistics 1930. 

1. Prom the data supplied calculate the following index numbers: 

a. Unweighted aggregative of actual prices. 

b. Weighted aggregative of actual prices, weighted by quantities, 

c. Unweighted arithmetic average of price relatives. 

d. Weighted arithmetic average of price relatives weighted by value. 

e. Weighted geometric average of price relatives. Weight the logs 
of relatives of value. 

f. After the index numbers are calculated switch the base of No. b 
and No. d to the last year of the period. 

g. Plot all five of the index numbers on millimeter graph paper with 
time on the X-axis. 


5,000,000,000 ft. 
340,000,000 lbs. 

94.000. 000 tons 

14.000. 000 bbls. 
10,000,000,000 bricks 





Part Four 

The Analysis of Small Samples 


CHAPTER 20 

THE ANALYSIS OF SMALL 
SAMPLES 


The theory and methods presented in Part II, although the}’' 
may be employed with relatively small samples, are designed 
primarily for large bodies of data and attain their highest degree 
of dependability in large samples. 'Tn samples of fewer than 
thirty items, the error is usually so large that it invalidates the 
dependability of the computations to a considerable degree. The 
reliability of samples tends to improve in proportion to the square 
root of their size. To obtain means, standard deviations, regres- 
sion, and correlation coefficients that are quite dependable for 
purposes of forecasting, samples must be of sufficient size to per- 
mit only a small error.'^ 

In the field of the social sciences, economics, education, farm 
management, home economics, marketing, finance, population 
problems, sociology, government, taxation, history, psychology, 
labor problems, office practice, military strategy and operations, 
shipping, etc., etc., it is usually quite easy to obtain large samples. 
The various censuses, corporation and governmental reports, 
school reports, and the like, supply large volumes of data. In 
this field, information may be obtained by questionnaires in 
large quantity which may be quite dependable. In some cases 
large samples also may be obtained in the fields of the physical 
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and biological sciences. In the various treatments administered to 
fruit flies, mice, rats, rabbits, guinea pigs, horses, cattle, and even 
to human beings in many cases, very large quantities of data 
may be obtained and large sample methods used. Even in physics 
and chemistry, sometimes large volumes of data are available. 

In many situations, however, where the data must be gathered 
by costly laboratory or field experiments that often require not 
only large expenditures of money but also long periods of prep- 
aration and development, it is impossible to obtain more than a 
few items of data in many weeks or even months and years. In 
all such cases it is quite necessary to have an adequate technique 
for the analysis of small samples. Even if it is possible to obtain 
large samples by the expenditure of considerable time and money, 
if smaller samples can be made to serve the purpose approximately 
as well, they are a very real economy and advantage. In fact, 
the techniques of analyzing small samples are so valuable and 
necessary that in recent years mathematical statisticians have 
given them much thought and carried them to a high degree of 
perfection. Such methods are so widely used today that it is 
advisable even in an elementary study to introduce the student 
to some of the more simple measures. 

Making decisions on the basis of inadequate data is not a new 
experience in human affairs. Often a jury has to reach a ver- 
dict on the basis of limited circumstantial evidence. Every day 
of our lives we have to make important decisions in business, 
politics, social affairs, and war, with less than all the facts. The 
analysis of small samples, therefore, falls within the need of all 
of us. 


WHAT IS A SMALL SAMPLE? 

There is no clear demarcation line between large and small 
samples. The division is not a point or line, but rather a zone in 
which the one shades off gradually into the other. Some statis- 
ticians have arbitrarily chosen 30 as the division point, but 29 
items are approximately as good as 31 and even 35 is not far re- 
moved from 30. The transition zone may well be considered as 
falling between 25 items and 50 items. One item added to 25 in- 
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creases the sample 4% but one item added to 50 increases the 
sample only 2%. Another reason for somewhat arbitrarily 
placing the upper edge of the small sample zone at 50 is that in 
small sample replications based on the Latin square the com- 
binations 5X5, 6X6, and 7x7 are quite common. Larger 
Latin squares are used but are not as frequent, 

PURPOSE OF SMALL SAMPLES 

The main reasons for using small samples are necessity or 
economy. In the physical and biological sciences in which much 
of the data are obtained from elaborate time consuming laboratory 
or field experiments it is often impossible to amass a large volume 
of data. An entire summer or year may be required to obtain 
data on field crops, insects, pests, and blights. In the case of 
trees, several years may be required. For cattle, sheep, hogs, 
horses, and other animals several generations may be necessary 
for some problems. Certain controlled experiments in live- 
stock feeding, in the use of new drugs on human beings, and in 
dietetics and metabolism yield only small quantities of data over 
a long period of time. If such problems had to be solved on the 
basis of large samples — hundreds or thousands of items — little 
progress could be made. In other cases fairly large samples ul- 
timately could be assembled, but only at prohibitive expense. 
If the number of laboratory assistants, the amount of laboratory 
or field equipment, or raw materials can be reduced 50% to 90%, 
from twice to ten times as many studies can be made. In cases 
in which either necessity or excessive costs stand in the way of 
scientific advancement the small sample methods are of great 
value. 

HOW SMALL SAMPLES ARE OBTAINED 

In Chapter 14 four methods of sampling were explained. 
Purposive sampling and “Stratified Purposive’^ sampling were 
illustrated as applying to large bodies of data. In both of those 
cases certain controls were introduced to make a smaller sample 
more truly representative of its population. The superior con- 
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trolled, small ''Gallup’' and "Fortune” polls of only a few thou- 
sand items were compared with the much larger but inferior for- 
mer "Literary Digest” polls of millions of items. In collecting 
small samples additional, much more exact and refined, controls 
are used which make it possible to measure certain characteristics 
of elusive populations with a large degree of accuracy from samples 
of only a few properly chosen items. In Chapter 4 the plan- 
ning of a statistical study was emphasized. Although much care- 
ful forethought is essential to effective large sample studies, it is 
doubly necessary in small sample analysis. In many cases, 
small samples simply cannot be found or taken. They must be 
made. 

Experimental Design 

Satisfactory small sampling procedures must conform to two 
principles. First, the particular and specific variation or variations 
which it is desired to measure must be separated from each other 
and from all residual variations. Second, an adequate basis for 
computing error must be established. The most effective devices 
so far discovered for achieving these two ends are known as ex- 
perimental design. The experiment which is to supply the de- 
sired data is planned and set up in a form which will segregate 
and reveal the particular type of variation which is to be measured. 
This method has been more fully developed in the fields of agri- 
culture than elsewhere, although the principles have a wide ap- 
plication. 

If it is desired to measure the difference in yield between two 
varieties of wheat, the experiment is set up to eliminate or to 
compute out all other causes of variations as far as possible. Wheat 
yields may vary because of differences in (1) soils, (2) fertilizers, 
(3) time of planting, (4) condition of seed bed, (5) moisture in 
soil, (6) amount of seed per square foot, (7) evenness of seed dis- 
tribution, (8) quality of seed, and many other factors. To meas- 
ure accurately real differences in yield due to variety all other 
factors must be held as near constant as possible. It is the same 
type of problem the physicist faces in the laboratory when he 
attempts to measure the relationship between the temperature 
and the pressure of a gas. The physicist must hold all other fac- 
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tom, including volume, equal. The statistician must design his 
experiment so as to eliminate all other variations or to segregate 
them and to compute them out of his results. 

One side of the field may be poorer soil than the other, or 
drainage may be better, or exposure to wind or heat may be 
unequal, or many other similar factors may obscure the true re- 
sults. A poor yielding variety on good soil might produce more 
than a good yielding variety on poor soil. The experiment must 
be designed to eliminate the effects of all such variations as be- 
tween the two varieties of wheat. 

Randomized Blocks or Rows 

The field may be laid off in a number of blocks each of which 
contains the same number of rows as in Fig. 90. If, for instance, 



Fig. 90. Randomized blocks in field test. By this 
method the five varieties are distributed among the 
five blocks in such a random order that all varieties 
appear in all possible locations. 
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the experiment was designed to test differences in yield of five 
varieties of wheat, there would be five rows in each block, distrib- 
uted in the several blocks in random order. If Block 1 was in the 
more favorable side of the field which continually grew poorer 
toward the opposite side including Block 5, variety A of wheat 
would be planted at random in all the various locations in the five 
blocks so that the advantages and disadvantages of location would 
be cancelled out; and variety B and all the others would have equal 
soil advantages with A . One limitation of the row method is that 
the field may also vary in soil or other respects diagonally so that 
there is an unequal variation within the rows from one end to the 
other. If the rows were very long, this might affect unequally the 
several varieties. This difficulty might be partially overcome by 
running the rows diagonally across the field or otherwise varying 
their contours, direction, or length. 

The Latin Square 

Under many conditions some form of the Latin square is a more 
effective control in experimental design. The Latin square is 
based on the relationship of If, for instance, four types of 
fertilizer were to be applied to one variety of cotton the square 
would be laid off 5 X 5 with as many rows as columns and with 
one block in each row for each kind of fertilizer and one check 
block with no fertilizer. If the four fertilizers were designated as 
il, B, C, and B, and the no fertilizer block as the Latin square 
could be set up as follows: 



Columns 

Rows 

1 

2 

3 

4 

5 

1 

A 

G 

B 

D 

E 

2 

D 

E 

c 

A 

B 

3 

B 

A 

D 

E 

c 

4 

B 

B 

A 

c 

D 

5 

C 

D 

E 

B 

A 
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By this device the field or plot is divided up into as many rows 
as columns in order that each type of fertilizer may be applied 
to each section of the field so that whatever relationship or varia- 
tion there might be between kind of fertilizer and soil condition 
would be equally distributed. All kinds of fertilizer would be 
applied on all types of soils. 

This design could be applied to five types of fertilizer used 
with five varieties of cotton so that each variety of cotton would 
be planted once with each kind of fertilizer. If the five varieties 
of cotton were each indicated by an Arabic digit as 1, 2, 3, 4, and 
5j the Latin square would be as follows: 



Columns 

Rows 

1 

2 

3 

4 

5 

1 

Ax 


Cz 

Dx 

Ez 

2 

B, 

c, 

Dx 

E2 

Az 

3 

C2 


Ex 

A5 

Bx 

4 

D, 

Ex 

Az 

Bz 

Cx 

5 

Ez 

Ax 

B, 

Cx 

Dz 


This form of the Latin square is not desirable, as R. A. Fisher 
points out, because the letters fall in too uniform a pattern. The 
fact that all the E’s run in one diagonal line through the middle 
of the square violates the principle of random sampling which 
should be followed strictly. The A’s and the D’s are only a little 
less biased in location. All the letters and numbers should be 
scattered throughout the square in random order. This random 
design is necessary to establish a correct basis for random error. 
A strictly random order may be obtained for the several letters 
and numbers by shifting by chance the location of either the rows 
or columns or both. The Latin square on page 472 is made from 
the one above by shifting Row 5 in the square above to the posi- 
tion of Row 2, and shifting Row 2 in the square above to the po- 
sition of Row 4, and shifting Row 4 to Row 5. 
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Columns 

Rows 

1 

2 

3 

4 

5 

1 


Ba 

Ca 

Di 

Ea 

2 

Ea 

A4 

Ba 

Cl 

Da 

3 

c. 

Da 

E, 

As 

B, 

4 


Ca 

D, 

Ea 

As 

5 

Da 

E, 

A2 

Ba 

C4 


If Column 1 in this square changed position with Column 4, 
or Column 2 changed position with Column 6, a still better ran- 
domization would, perhaps, be obtained. 

Latin squares may be set up for any value of but in actual 
practice the number of varieties and variables which it is con- 
venient to include in one problem usually limits it to 4 X 4, 5 X 5, 
6 X 6, 7 X 7, or at most 8X8. 

Computation of Error 

In Chapter 14 problems of sampling and error were con- 
sidered. It w-as shown that standard errors are based on random 
sampling and pure-chance variation as expressed in the normal 
curve of error. It is, of course, likely that a small sample of 
from four to ten items, or even from 12 to 30 items will not de- 
scribe a very smooth normal curve. The sample is too small. 
But the computation of a dependable measure of error still de- 
pends on the normal random distribution of the items in the form 
of the normal curve. The experimental design from which the 
small sample is obtained must be so managed that it is free 
from any bias or abnormality which would violate the assump- 
tions underlying the normal distribution curve. 

EFFECT OF SMALL SAMPLES ON STANDARD MEASURE 

The fact that a sample is small and its error large need not 
invalidate it if we are able to measure that error with sufficient 
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accuracy and to make necessary adjustments for the error. An 
excellent example of this fact is the magnetic needle of the sur- 
veyor's or mariner’s compass. The fact that the earth’s magnetie 
poles do not coincide with the poles of the earth’s axis prevents 
the magnetic needle from pointing directly north and south in 
most parts of the earth. But the fact that mariners and sur- 
veyors are able to compute accurately the amount of error enables 
them to steer their ships and run their lines as accurately as if 
there were no error in the compass. The same principle applies 
to all statistical computations. The increased error because of 
small samples need not invalidate our work if we can compute 
accurately the amount of adjustment necessary to compensate 
for it. 

Our first problem is to determine how large the errors are for 
all the principal statistical measures, means, standard deviations, 
and coefficients of regression and correlation for various sizes of 
small samples. These relationships are illustrated and, perhaps, 
to some extent, demonstrated by the computations in the fol- 
lowing table. Nine large samples of 105 items were taken from 


TABLE 35 


Comparison of 9 Large and 9 Small Samples from Stillwater, 
Oklahoma, Grade School Children 


9 Samples of 105 

I 9 Samples of 26 


Largest 

- 

Smallest 

Range 

Largest 

~ Smallest 

Range 

Z 

52.96 


52.31 

.65 

53.38 

- 51.81 

1.54 

Cx 

.488 

— 

.439 

.049 

1.06 

.83 

.22 

(Tx 

5.00 

— 

4.48 

.52 

5.26 

4.5 

.81 

CTir 

.345 

— 

.315 

.030 

.737 

.63 

.107 

Y 

66.00 

— 

64.29 

1.71 

70.23 

- 62.54 

7.69 


1.99 

— 

1.57 

.42 

5.18 

2.96 

2,22 

(Ty 

20.28 

— 

15.12 

5.16 

21.25 

- 12.71 

8.54 

(T<t 

1.403 

— 

1.042 

.361 

4.25 

- 2.54 

1.71 

r 

.907 

— 

.826 

.081 

.950 

.749 

.201 


.823 

— 

.682 

.141 

.902 

.562 

.340 

<Tr 

.031 


.017 

.014 

' .088 

.019 

.069 

^yx 

3.69 

— 

2.90 

.79 

4.51 

~ 2.59 

1.92 



474 


THE ANALYSIS OF SMALL SAMPLES 


Ratio of Range of 26 Item Samples to 105 Item Samples 


Range of Errors 

Ratio of 
Errors 

Range of Errors 

T , . Samples 

Measure ^ 2 ^ 

Ratio of 
Errors 

z 

.65 

1.54 

2.37 

dy 

5,16 

8.54 

1.66 

CTx 

.049 

.22 

4.49 

(y<T 

.361 

1.71 

4.74 

(Tx 

.52 

.81 

1.56 

r 

.81 

.201 

2.46 

<r a 

.030 

.107 

3.57 

^2 

.141 

.340 

2.34 

7 

1.71 

7.69 

4.50 

dr 

.014 

.069 

4.93 


.42 

2.22 

5.30 

byx 

.79 

1.92 

2.43 


the Stillwater, Oklahoma, grade school population, and nine 
smaller samples of 26 items each were also taken from the same 
parent population; and the arithmetic mean, standard error of 
the mean, vStandard deviation, standard error of the standard 
deviation, the coefficients of correlation and determination, the 
standard error of the coefficient of correlation and the regression 
coefficient were computed for both height and weight of chil- 
dren in both sets of samples. 

It will be noted that in every case the range of error is larger 
for the 26 item samples than for the 105 item samples. The errors 
for the various measures of the smaller samples are from 1.56 to 
4.93 times as large as the corresponding errors of the larger sam- 
ples. This clearly demonstrates (1) that the larger the sample is 
the smaller is the error; (2) that larger samples are to be pre- 
ferred to smaller ones if the labor and cost are not prohibitive; 
and (3) that if we can compute the error of a small sample with a 
high degree of accuracy, it may be used to great advantage in 
many cases. 

The following generalizations may be made from the computa- 
tions in the above tables: 

1. The means of small samples tend to fall in about equal num- 
bers on either side of the true mean, but the smaller the samples 
are, the more likely the means are to scatter over a wider range. 

2. The standard deviations of small samples tend to be smaller 
than the true standard deviation of the population from which 
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they are drawn, and the smaller the samples, the smaller the stand- 
ard deviation tends to be. 

3. Since the standard deviations of small samples tend to be 
smaller than the parameter of the population, all errors based on 
these smaller standard deviations tend to be too small. 

4. Unless corrections are made for the errors of small samples^ 
they always tend to understate the actual error. 

Fortunately there is available an accurate method of computing 
the increased error of small samples which is easy to apply. It 
may be stated briefly as follows: 

Let ^ “true mean of the population 

^ X = the mean of a single sample 
(Tx = the standard error of the mean 
y , X-~M 

(T.x 

or the difference between the sample mean and the population 
mean in terms of the standard error of the mean. When the 
standard error of the mean, cr^, is based on a large number of 
samples as is shown in Chapter 14, page 343, or is computed 
from the true standard deviation of the population, T is a correct 
and adequate measure of the deviation of X from M. But when 
(Jx is computed from one small sample, it is not a correct measure 
of this deviation. It must be corrected for smallness of sample. 
This correction is possible by the following method: 

Let s = the standard deviation of the small sample 

Sx = ihe standard error of the mean of the small sample 
X = X — M, the difference between the sample mean and 

the true population mean, and t = 

Sx 

or the measure of the deviation between X and M in terms of the 
small standard error of the small sample mean. Then we may 
derive if, the measure for the small sample, from T, the measure 
for the large sample: 


(Tx Sx 



os 


Sx 

(Xx 
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The ratio of ^ to T is standard error 

of the small sample, 5x, to the standard error of the true mean, 
cTg, is known, having been discovered and measured for all sizes of 
small samples by experiments and tests based on pure chance by 
the mathematical statisticians. 

Therefore, ^ 

These corrections do not have to be computed by the student or 
statistician who uses them. For small samples of various sizes, 
they may be read from Table 36. 

It is not possible to prove that there could never be an exception 
to a sample. The sample would have to be infinitely large to do 
this. It would have to include the entire population. Other- 
wise the chance would always remain that once in a thousand, or 
in ten thousand or in a million times there would be an exception. 
But it can be logically and safely inferred that if the differences 
are sufficiently few and far apart they may be accounted for by 
chance sampling and do not prove that the sample does not ade- 
quately represent the population. If the differences between two 
samples are sufficiently great, if they are so great that they cannot 
be accounted for by chance sampling within certain wide limits, 
it is safe to infer that the sarnples represent two different popu- 
lations. From the statistical standpoint this method is made 
easier if an assumption of no difference between the samples is 
made and this assumption is proved false by showing that the 
difference is so great that it cannot logically be accounted for by 
chance sampling. 


THE NULL HYPOTHESIS 

The method of approaching this analysis is to set up a hy- 
pothesis of no difference, a null hypothesis. Suppose that an ex- 
periment has been set up to study the results of two methods of 
teaching reading in the second grade. A given number of pupils 
with the same intelligence quotients and cultural background, 
age, and other characteristics, would be chosen to participate in 
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the experiment. The pupils would be divided into two equal 
groups. For a given period, perhaps one term, each group would 
be taught to read by a method different from that used with the 
other group, and comparisons Avould be made between the two 
groups at the end of the term to determine which class had gained 
most in vocabulary and speed and comprehension in reading. 
The experimenter would assume that there was no difference in the 
results of the two methods, and then subject his assumption to 
proof. He would assume the null hypothesis and then proceed to 
disprove it, if he could. If the tests at the end of the term showed 
a large difference in student reading attainment, the null hy- 
pothesis would have been discarded and a superior method of teach- 
ing reading demonstrated. If the tests show only a small difference 
in student reading attainment, the null hypothesis would have 
been sustained and no superior method of teaching reading dis- 
covered. Any planned experiment would be set up on the same 
principle. If we wished to prove that keeping bright electric 
lights on hens all night would result in less hen time spent on 
the roost and more hen time spent in exercise on the floor eating 
and in egg production, we would begin by assuming the null hy- 
pothesis, or that such a procedure would make no difference in 
egg production. We would then select two equal groups of hens 
identical in all particulars, permitting one group to be in the 
dark at night while the other group was kept in the light. The 
comparative number of eggs produced by the end of a fixed period 
would determine whether our hypothesis of no difference was 
sustained or discarded. 

DEGREES OF FREEDOM 

Earlier in this chapter the fact that the standard deviation of 
small samples tends to be too small was presented. A device which 
will aid in correcting this bias in error is the practice of using 
(N — 1) or {N — m) for N in dividing the squared deviations from 
the mean. The explanation of this practice rests upon what the 
mathematicians call degrees of freedom.’’ This idea is based on 
the fact that the sum of the deviations from a mean must always 
total zero (0). This point may be demonstrated as follows: 
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X - X = a; 

9 - 10 = - 1 
12 - 10 = + 2 
8 - 10 = - 2 
7 - 10 = 3 

13 _ 10 == + 3 
31 - 10 = + 1 
60 0 


_ SX _ 60 _ 
^ " AT " 6 ” 
= 0 (always) 


10 


Since the deviations from the arithmetic mean must equal zero, 
it follows that the freedom of variation of such deviations is 
limited. Since their sum must equal zero, one of the deviations 
is not free to vary, but must compensate algebraically for the 
value of the others. If in a distribution of six items, five deviations 
from the mean were — 7, + 2, + 1 , + 8, — 9, the other item must 
be a + 5 to make the sum of deviations zero. It is not a question 
of which deviation is not free to vary. It is merely a fact that if 
all the others are free to vary, this particular one is not free. The 
degrees of freedom, as they are called, are one less than the num- 
ber of items. For the sum of the squared variations of a single 
variable, the statement is (X — 1). For cases in which there is 
included in a grand total the sum of the squared deviations of 
two or more variables, the formula requires {N — M) in which 
M = number of variables. The principle of ^ degrees of free- 
dom in the sum of deviations from a mean is always correct 
theoretically, but it is of little practical significance in treating 
large samples. For instance, (500 — 1) or 499, is little different 
from 500, but (5 — 1), or 4, is twenty percent less than five. In 
small samples, therefore, it is very necessary to restrict our com- 
putations by ^^degrees of freedom,'^ signified by (d/), in order 
not to overstate our error. 


TEST OF SIGNIFICANCE OF DIFFERENCES 
BETWEEN MEANS 

The use of the table of i-values in the analysis of small samples, 
may be illustrated by taking samples of 6 items each from the 
first, second, and third grades of the heights of Stillwater, Okla- 
homa, school children. Up to this point it has been assumed that 



480 


THE ANALYSIS OF SMALL SAMPLES 


the heights of the children of the first six grades composed a 
single population suitable for statistical analysis. In fact, up to 
this point, it has been subjected to a great deal of analysis, from 
means to regression and correlation coefficients. But now we 
wish to question the assumption that it is a single population 
and subject this assumption to proof. We wish to show that it 
is, in fact, six separate populations, as far as height is concerned — 
a separate population for each grade. We shall attempt this test 
by means of small samples of 6 items each from the first three 
grades. The same type of tests could be applied to all six grades. 


WORKSHEET NO. 94 



Grade 1 

Grade % 

Grade 3 


Heights 

Heights 

Heights 


48 

49 

50 


43 

48 

52 


47 

50 

52 


44 

48 

54 


46 

47 

48 


43 

46 

51 

Sums 

271 

288 

307 

X 

45.167 

48.0 

51.167 


22.567 

10.0 

20.833 


4.633 

2.0 

4.167 

s 

2.152 

1.414 

' 2.04 

Sx 

.9624 

.6324 

.9123 


In which: 


== sum of squared deviations from means 
5 = standard deviations of small samples 
== squared standard deviations 
Sx = standard error of mean of small samples 
Si _2 = standard deviation of differences of two samples 


Then, by 


Formula No. 74 


^ ^ + 'Zxi 4 ^ 


''22.567 + 10 




/32.567 
10 


6 + 6-2 
= V3.2567 = 1.8 




DIFFERENCES BETWEEN MEANS 


481 


and Formula No. 75 

Xi-Xi ' NiN2 48.0-45.16 /M 

si _2 \ Ni + Ni 1.8 V 12 

o 04^ _ 

= ^ V3 = 1.58 X 1.732 = 2.737 


Turning to Table 36 where n — 5 and P = .05, we find ^ = 2.571. 
This figure, 2.571, means that for a sample of 6 (iV -- 1 is 5), the 
probability is 5%, or in only one case out of 20 that this difference 
could have occurred from chance. In other words, a = 2.571 in 
a sample of 6 items means that the chances are 20 to 1 that the 
two small samples were drawn from separate parent populations. 
This means that the heights of first-grade children and second- 
grade children are two separate universes. If the if-value had been 
P = .01 for 5 degrees of freedom, or 4.032, it would have indicated 
that there was only 1 chance in 100 that this large difference be- 
tween the sample means could have occurred from chance sam- 
pling. Statisticians designate 1 chance out of 20, or 95% proba- 
bility, as significant. A 1 out of 100 chance or a 99% probability 
they call highly significant. In the problem above, t = 2.737, 
which is more than the significant limit of 2.571, but is less than 
the highly significant value of 4.032. Our t of 2.737 based on 
samples of 6 items is perhaps not conclusive proof that the two 
grades are separate populations, but it is a strong indication that 
this is the case. A sample of 10 items would no doubt yield con- 
clusive proof. Here is a case in which two tiny samples of 6 items 
each have revealed significant information about the heights of 
school children. 

If we wish to make the comparison between grades two and 
three, the computations are as follows: 


Sl-2 


V 


'2/Xi^ 4- 
Ni + N 2 — 2 


v/^ 


20.833 + 10 

6 + 6-2 




30.833 

10 


= V3.8033 = 1.76 


482 


THE ANALYSIS OF SMALL SAMPLES 


Zi - Za / ZiZa _ 51.167 + 48 /36 
si_2 VZi + Za 1.76 V 12 

= Vs = 1.8 X 1.732 = 3.118 

Looking again at Table 36 where n = 5 and P = .05, we find 
t = 2.571 which is considerably smaller than the 25 = 3.118 in our 
problem. We conclude, therefore, that the difference in the 
heights of second-grade and third-grade school children is sig- 
nificant, but is not highly significant. 

If, however, we wish to make a comparison between the first 
and third grades, we find that the difference is highly significant. 

, / , /22.567 + 20.833 

VZ1 + Z2-2 V 6 + 6-2 

= = VIM = 2.083 

Z 1 -Z 2 / ZiZa 51.167-45.167 /36 
' Si-2 V Zi + Z 2 2.083 V 12 

= 2"^ V3 = 2.89 X 1.732 = 5.005 

Checking again with Table 36, we find for n = 5 and P = .05 
that t — 2.571. The if-value of 5.005 in the problem above is 
nearly twice this amount and is, therefore, significant. In fact, 
it is larger than the highly significant value of if = 4.032. We 
'Conclude, therefore, that the two small samples for the first grade 
and the third grade could not have been taken from the same parent 
population. We have learned from our three small samples that 
the grade school population is not a single homogeneous popula- 
tion but is a conglomerate of several populations. Samples from 
the heights of the pupils in the other grades would, no doubt, 
have revealed the same facts. The point in this analysis is not 
that this information could not have been obtained from large 
samples from each grade. Of course, it could have been learned 
from large samples and with greater certainty than from small 
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ones. But the point is that it was obtained from very small sam- 
ples of 6 items each with the aid of Table 36. This illustrates 
the real economy of time and money that may be made from the 
careful analysis of small samples. It opens to statistical analysis 
the whole field of planned experimental research, which must 
often depend entirely on small samples. Although these small 
samples were taken at random they were not produced by experi- 
mental design. 


TEST OF REGRESSION COEFFICIENTS 

Small samples may be used not only to test the similarity and 
reliability of means to determine whether the samples were taken 
from the same parent population, but also to test the degree of 
similarity between regression coefficients. In connection with 
the small samples of 6 items used above to test the similarity of 
sample means, we may raise the question as to whether the rela- 
tionship between height and weight is the same for these three 
grades. Does the weight of second-grade children increase in 
proportion to the increase of their height at approximately the 
same ratio that holds for first-grade children? Do they tend to 
grow faster in height than they do in weight? To what extent do 
the regression coefficients between height and weight for these 
three grades coincide? This test also can be made from small 
samples. 


WORKSHEET NO. 95 

1st Grade 2d Grade 3rd Grade 


z 

Y 

Z 

Y 

Z 

Y 

Height 

Weight 

Height 

Weight 

Height 

Weight 

48 

56 

49 

55 

50 

49 

43 

48 

48 

50 

52 

62 

47 

58 

50 

52 

52 

57 

44 

46 

48 

49 

54 

59 

46 

50 

47 

47 

48 

50 

43 

46 

46 

49 

51 

58 

271 

304 

288 

302 

307 

335 
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Z 2 



72 

X 2 

72 

2,304 

3,136 

2,401 

3,025 

2,500 

2,401 

1,849 

2,304 

2,304 

2,500 

2,704 

3,844 

2,209 

3,364 

2,500 

2,704 

2,704 

3,249 

1,936 

2,116 

2,304 

2,401 

2,916 

3,481 

2,116 

2,500 

2,209 

2,209 

2,304 

2,500 

1,849 

2,116 

2,116 

2,401 

2,601 

3,364 

12,263 

15,536 

13,834 

15,240 

15,729 

18,839 

12,240.16 

15,402.33 

13,824 

15,200.67 

15,708.16 

18,704.16 

Sa:i222.83 

S2/i"133.67 

Sxjno 

S 2/223933 

Sxam.ss 

S2/2n34.83 


XF 

XF 

XF 

2,688 

2,695 

2,450 

2,064 

2,400 

3,224 

2,726 

2,600 

2,964 

2,024 

2,352 

3,186 

2,300 

2,209 

2,400 

1,978 

2,254 

2,958 

13,780 

14,510 

17,182 

13,730.67 

14,496 

17,140.83 


Sa;i2/i49.33 Sa:2y2l4.0 


Formula No. 76 
= SZ2 - 


Formula No. 77 

I ~ ^ ^2 

cr 


Formula No. 78 


O' 6i-&2 



A! 
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Formula No. 79 

<? , S(Fi- + F2')^ 

iVi + N2 - 4 


'Zxi^ = 22.83 
'Exi^ = 10.0 
Exs^ = 20.83 


Computations op t 

Eyt^ = 133.67 
Eyi = 39.33 
Ey^ = 134.83 


Exm = 49.33 
Ex^yi = 14.0 
2*32/3 = 41.16 


18 2 = 


2 (Fi - F1O2 
N ■ 

E{\\ ■ 


1 


S ^ 

^2/a 


N - 1 

2(7.3 - IV)- 


JV - 1 


= 2.924 


52 = 3.168 


= 13.322 


'0’bi—b2 ' 


.,-4, = y/' 


( fbi—bz ■ 


/2.924 

3.168 

22.83 ‘ 

10 

/3.168 , 

13.322 

10 

20.83 

'2.924 

nri 1* 

13.322 


= \/.1368 + .3168 = = .673 

= V.yi68 + .6395 = VM63 = .977 
= V.1368 + .6395 = V7763 = .881 


49.33 

22.83 


1.861 


10.0 


1.400 




41.16 

20.83 


1.976 


t 

t 

t 


1.861 - 1.400 .461 

.670 .670 ■" 

1.400 - 1.976 .576 

' ♦OOi7 

.977 .977 

1.861 - 1.976 .115 

.883 .883 


Checking the probability column, .05, on Table 36 for sample 
N — 2, or 4, we find P = 2.774. Our !{-value for differences be- 
tween bi and 62 is only .688 which is less than one-fourth the 
requirement for a significant difference in regression lines. We 
conclude, therefore, that there is no significant difference in the 
increase of the weight of children in proportion to their height 
between the first grade and the second grade. The ^-values for 
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the differences between the second and third grades and between 
the first and third grades are still smaller and are, therefore, not 
significant. This means that the rate of increase in weight per 
inch of height is about the same regardless of age among these 
children, t shows only non-significant values for this regression. 


TEST OF CORRELATION COEFFICIENTS 


Computing r for each of our small samples from the deviations 
in Formula No. 80, we have: 


Formula No. 


80 




ri = 


n = 


n = 


49.33 


V22.83 X 133.67 
14.0 ^ 

VlO.O X 39.33 
41.16 


V2Z2-2F2 

49.33 

V3, 051.6861 
14.0 14.0 


V3^ 19-84 

41.16 


^ 49.33 
55.24 

= .706 

41.16 


V'20.83 X 134.83 V'2, 808.5089 53.00 

Formula No. 81 
^ rVN - 2 


= .893 


= .777 


t = 


t = 


< = 


.893V'6 - 2 _ 

.893 X 1.414 

1.2627 

Vl - .893 

vClOT 

.327 

.706V6 - 2 _ 

.706 X 1.414 _ 

.9983 

Vl - .706 

V.294 

.542 

.777V6 - 2 _ 

.777 X 1.414 _ 

1.0987 

Vl - .777 

V.223 

.473 


= 3.861 


= 1.842 


= 2.323 


Table 36 gives a i-value of 2.776 for a sample of (N — 2), or 4, 
and a P-value of .05. In our sample the t for the first grade, 
3.861, only is larger than the significant value in the table. This 
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value is, however, considerably smaller than the t = 4.604 for 
P = .01. We may conclude, therefore, that these samples are 
not sufficient to establish the fact of significant correlation be- 
tween height and weight in the first three grades. There is, of 
course, the possibility that larger samples might do so. 


TEST OF PERCENTAGE OCCURRENCES 


Formula No. 82 
^ _ i /mi 4 - 


in which p = favorable occurrences and q = unfavorable occur- 
rences. (See Chapter 14 on Probabilities.) 

In the distribution of class grades for large groups, 8% of the 
grades are normally A^s. In one freshman class, only 4% of the 
grades were A’s. The question arises as to whether this small 
percentage is a significant departure from the standard. The 
solution is as follows: 




08 X .92 .04 X .96 


100 


25 




.0736 .0384 


t = 


= V.000736 - .001530 
Pi — P2 _ .08 — .04 


100 ' 25 

VmS = .0283 
.04 


.0283 


.0283 


1.413 


In Table 36 the value of t = 2.060 for a sample of 25; for 
P = ,05. Even the i{-value for a very large sample for P - .05 
is 1.95996. We conclude, therefore, that the difference is not sig- 
nificant. This percentage test may be applied to large samples 
as well as small ones. 


SUMMARY OF TESTS BETWEEN MEASURES 
OF SMALL SAMPLES 

1. The amount of corrected error for small samples of various 
sizes up to 25 items is given in Table 36, known as Table of t- 
Values j or simply the t-table. 
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2. By the methods shown above the difference between only 
two samples at one time may be measured for only one statistic 
at a time. 

3. The degree to which the difference between two sample means 
is significant may be measured. 

4. The degree to which the difference between the 6-values of 
the regression lines of two samples is significant may be measured. 

5. The degree to which the r-values of the coefficient of correla- 
tion of two samples is significant may be measured. 

6. The degree to which the difference between two percentages 
is significant may be measured. 

This method is the most simple and limited in range of all the 
methods of testing the significance of the statistics of a small 
sample. Only two samples can be used at once. Separate tests 
are necessary for each statistic for each pair of samples. The 
method, however, may be easily and quickly applied to a few sam- 
ples, but if there are a large number of pairs of samples the task 
becomes long and tedious. 

CHI-SQUARE MEASURE OF VARIATION 

The degree to which a given sample may vary from an accepted 
or theoretical standard is measured by Chi-Square. It is a useful 
device for checking hypotheses against observed frequencies. It 
may be used with large samples as well as small ones, but it can 
be applied to quite small series. 

Formula No. 83 
^ m m 

in which 

m = expected occurrences 
Xi = 1st set of occurrences 
X 2 = 2d set of occurrences 

One might raise the question as to the sex distribution of grade- 
school children. In a normal population, the sex distribution is a 
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1 : 1 or 50-50 relationship. If we take a random sample of the 
sex of the children in 10 rooms by selecting all the boys and girls 
in each room whose names begin with the letter A, we have the 
following distribution: 


WORKSHEET NO. 96 
School Rooms 


Sex 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Totals 

Boys 

1 

2 

0 

1 

0 

3 

1 

5 

2 

3 

18 

Girls 

4 

0 

1 

2 

0 

2 

1 

2 

1 

1 

14 

Totals 

5 

2 

1 

3 

0 

5 

2 

7 

3 

4 

32 


Various types of analysis may be applied to these small sam- 
ples. A percentage comparison may be made. 


Number Percentage 


Boys 

18 

56.25 

Girls 

14 

43.75 

Total 

32 

100.00 


If the same percentage analysis should be applied to the separate 
rooms, the fluctuations would be wide and irregular. The samples 
are too small. If, however, there were some definite and per- 
manent relationship between names and sex, and the sex names 
were evenly distributed so that in each room there were two pu- 
pils whose names began with A and one was a boy and the other 
was a girl, so that in this respect all rooms were uniform, the 
analysis of one room would be sufficient to determine the sex 
ratio in all rooms. Such uniformity in sex-name relations does 
not exist, and only an average of several rooms would give a de- 
pendable result. 

One might consider the average number of boys and girls per 
room, whose names began with the letter A. 

boys =1.8 boys per room 
girls =1.4 girls per room 




490 


THE ANALYSIS OF SMALL SAMPLES 


Still another way of analyzing the data is to compare the re- 
sults of these samples with an assumed or accepted theory of dis- 
tribution* Since the sex of children seems to be determined by 
chance, and the chances are approximately equal that any child 
may be male or female, one may set up the proposition that the 
sex of a group of school children on the average should be 50% 
males and 50% females. Taking this as the parameter of the popu- 
lation, one may compare the sampled rooms with this normal dis- 
tribution to determine whether these school rooms depart from 
the normal to a significant degree. 

Solution of Chi-Square 

^ (Zi - mY (X 2 - mY ^ (18 -- 16)^ (14 - 16)^ 

^ m m 16 16 

16 “^16 16 16 

Since Chi-Square is the measure between a sample and an ac- 
cepted theory or standard, the question arises as to how large yf 
must be before it indicates a significant variation. Again, we are 
indebted to R. A. Fisher for Table 37, the table, the limits of 
significant values for samples of various sizes. 

It is evident from Table 37 that when x^ = 0, P = 1, and when 
X^ = 00 , P = 0.00. For all intermediate values of yf, P varies 
with n. In our sample of two groups of grade-school children, 
yf is equal to .5. For it to be significant for this size of sample, 
it would have to be at least 3.841, the value indicated in Table 37. 
Since our x^~value is .5, less than one-seventh of the value re- 
quired to indicate a significant difference, we conclude that the 
sex variation in our sample is caused only by chance sampling. 

By means of Chi-Square it is possible to measure the degree to 
which a given frequency distribution varies from the normal ex- 
pectation. If we take the frequency distribution of the weights 
of 106 grade-school children as given in Worksheet No. 7, and 
compare this rather uneven distribution with the normal curve of 
error by means of computing x^ we can determine whether it 
departs from normal to a significant degree. 
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TABLE 37* 
Chi-Square Values 


n 

P = .99 

.95 

.50 

.10 

.05 

.02 

.01 

1 

.000157 

.00393 

.455 

2.706 

.3,841 

5.412 

6.635 

2 

.0201 

.103 

1.386 

4.605 

5.991 

7.824 

9.210 

3 

.115 

.352 

2.366 

6.251 

7.815 

9.837 

11.341 

4 

.297 

.711 

3.357 

7.779 

9.488 

11.668 

13.277 

5 

.554 

1.145 

4.351 

9.236 

11.070 

13.388 

15.086 

6 

.872 

1.635 

5.348 

10.645 

12.592 

15.033 

16.812 

7 

1.239 

2.167 

6.346 

12.017 

14.067 

16.622 

18.475 

8 

1.646 

2.733 

7.344 

13.362 

15.507 

18.168 

20.090 

9 

2.088 

3.325 

8.343 

14.684 

16.919 

19.679 

21.666 

10 

2.558 

3.940 

9.342 

15.987 

18.307 

21.161 

23.209 

11 

3.053 

4.575 

10.341 

17.275 

19.675 

22.618 

24.725 

12 

3.571 

5.226 

11.340 

18.549 

21.026 

24.054 

26.217 

13 

4.107 

5.892 

12.340 

19.812 

22.362 

25.472 

27.688 

14 

4.660 

6.571 

13.339 

21.064 

23.685 

26.873 

29.141 

15 

5.229 

7.261 

14.339 

22.307 

24.996 

28.259 

30.578 

16 

5.812 

7.962 

15.338 

23.542 

26.296 

29.633 

32.000 

17 

6.408 

8.672 

16.338 

24.769 

27.587 

30.995 

33.409 

18 

7.015 

9.390 

17.338 

25.989 

28.869 

32.346 

34.805 

19 

7.633 

10.117 

18.338 

27.204 

30.144 

33.687 

36.191 

20 

8.260 

10.851 

19.337 

28.412 

31.410 

35.020 

37.566 

21 

8.897 

11.591 

20.337 

29.615 

32.671 

36.343 

38.932 

22 

9.542 

12.338 

21.337 

30.813 

33.924 

37.659 

40.289 

23 

10.196 

13.091 

22.337 

32.007 

35.172 

38.968 

41.638 

24 

10.856 

13.848 

23.337 

33.196 

36.415 

40.270 

42.980 

25 

11.524 

14.611 

24.337 

34.382 

37.652 

41.566 

44.314 

26 

12.198 

15.379 

25.336 

35.563 

38.885 

42.856 

45.642 

27 

12.879 

16.151 

26.336 

36.741 

40.113 

44.140 

46.963 

28 

13.565 

16.928 

27.336 

37.916 

41.337 

45.419 

48.278 

29 

14.256 

17.708 

28.336 

39.087 

42.557 

46.693 

49.588 

30 

14.953 

18.493 

29.336 

40.256 

43.773 

47.962 

50.892 


This table is reproduced through the courtesy of R. A. Fisher and his 
publishers, Ohver and Boyd, Edinburgh. The entries are taken from Table III, 
Statistical Methods for Research Workers. 
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This computation is shown in Worksheet No. 97. The original 
frequency is taken from Worksheet No. 7 in Chapter 6. The 
student will notice that the first three classes of the original fre- 
quency distribution are combined in Worksheet No. 97 as one 
class, Under 46.5’^; and that the last four classes of the original 
frequency distribution are combined as one class, “Over 58.4.’’ 
This consolidation of the two tail groups is made to prevent an 
over-magnification of the effects of a small absolute deviation, 
which may easily occur in small frequencies. For instance, when 
the last four classes of the original frequency are considered sep- 
arately their value is 3.978 but when combined the value is 
only .700. In the first case the over magnification of the small 
absolute deviations + 4.0, — 1.3, and + 0.1 resulted in an expan- 
sion of x^ by (3.978 — .700) or 3.278 which is more than one-fourth 
of the total of 12.251 by the combined method. Used separately, 
X^ = 17.778. With the small end classes combined it is reduced 
to 12.251 which more nearly represents the actual deviation be- 
tween the observed and theoretical frequencies. 


WORKSHEET NO. 97 


Computation of x^ Testing Goodness op Fit op the Normal 
Curve of Error to the Distribution op 106 Stillwater, 
Oklahoma, Grade-School Children 


1 

Class 

Intervals 

2 

Observed 

Frequencies 

/i 

3 

Theoretical 

Frequencies 

/2 

4 

(Ji-M 

5 

6 

cfi -f^y 
h 

Under 46.5 

7 

10.5 

- 3.5 

12.25 

1.167 

46.5-48.4 

18 

10.4 

+ 7.6 

57.76 

5.554 

48.5-50.4 

13 

14.4 

- 1.4 

1.96 

.136 

50.5-52.4 

14 

17.7 

- 3.7 

13.69 

.773 

52.5-54.4 

23 

17.5 

+ 5.5 

30.25 

1.728 

54.5-56.4 

10 

14.2 

- 4.2 

17.64 

1.242 

56.5-58.4 

7 

10.1 

- 3.1 

9.61 

.951 

Over 58.4 

14 

11.2 

+ 2.8 

7.84 

.700 


106 

106.0 



12.251 
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Formula No. 84 


= 2 


h J 


12.251 


If we check in Table 37 for {n — 3), 5, and P = .01, we find 

= 15.086, which is considerably larger than the figure 12.251 
for our distribution. We conclude, therefore, that our frequency 
distribution does not differ significantly from a normal distribu- 
tion. Such deviation from the normal curve, as it manifests, 
may be accounted for by chance sampling. 

This device for checking the conformity of samples with ac- 
cepted or standard hypotheses is of great value in agricultural, 
biological, chemical, and other laboratory sciences in which it is 
difficult or impossible to obtain large samples, x^ does not reveal 
a great deal of information about the data, but it does indicate 
whether the deviations from the hypothesis are sufficient to be 
significant for further study. In developing a new strain of 
plants or a new breed of animals, one must always determine 
whether the peculiar characteristics of the new order are indicative 
of a separate strain or simply caused by chance sampling, x^ 
may also be used in checking the conformity of economic, socio- 
logical, psychological, and other social science data. 


THE CHI-SQUARE TEST OF HOMOGENEITY 

In many fields of statistical study it is desirable to test the 
similarity, likeness, or homogeneity of a total group of data. 
This may be done by sub-dividing the total group into sub- 
classes according to some standard classification, as equality of 
purchasing power, class grades, courses failed, incomes, length of 
service, rank, or other such accepted classifications, or by arbi- 
trarily dividing the total group into sub-classes to test their 
homogeneity. The failures of the four standard college classes in 
a university are so tested in Worksheet No. 98. 

The question arises as to whether this student body is an homo- 
geneous group as far as failures are concerned. To test the 
assumption that it is an homogeneous group, theoretical or ex- 
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pected failures and non-failures for each class are computed by 
applying the percentage of failures and non-failures for the total 
group to each class. The expected or theoretical frequencies for 
students not failing are computed by taking the percentage of 

WORKSHEET NO. 98 


COMPAEISON OF OBSERVED AND THEORETICAL FREQUENCIES OF 

Students Enrolled and Failures and Non-Failures 
BY Four College Classes 


1 

2 

3 

4 

5 

6 

7 


Seniors 

Juniors 

Sopho- 

mores 

Freshmen 

Totals 

Percent 

1. Total Number of Stu- 
dents 

634 

708 

1,030 

1,543 

3,915 

100.0 

fi 

2 Actual Number of Stu- 
dents without Failures 

487 

286 

262 

246 

1,281 

32 7203 

/2 

3. Theoretical Number of 
Students without 
Failures 

i 

207 

232 

337 

605 

1 

1,281 


1 1 

+ 280 
78,400 

+ 54 
2,916 

-75 

5,625 

- 259 
67,081 

0 


„ (A - A)“ 

®' A 

378.7 

12 6 

16 7 

132 8 



fl 

7. Actual Number of Stu- 
dents with Failures 

147 

422 

768 

1,297 

2,634 


f2 

8. Theoretical Number of 
Students with Fail- 
ures 

427 

476 

693 

1,038 

2,634 

67 2797 

9 (A -A) 

10 (/ i -/ 2)2 

- 280 
78,400 

- 54 
2,916 

+ 75 

5 625 

+ 269 
67.081 

0 


(/ i -/ 2)2 

183 6 

6 1 

81 

64.6 

803 2 



those students, 1,281, as against 3,915, the total number of stu- 
dents. This percentage is 32.7203. By multiplying the total 
number in each class by this percentage, the theoretical fre- 
quencies of 207 seniors, 232 juniors, 337 sophomores, and 505 
freshmen — who, if the group is homogeneous, are expected not 




SUMMARY 


495 


to fail — will be obtained. These figures are shown in line 3. 
In like manner, if the total number of students in each class is 
multiplied by the average percentage of failures, 67.2797, the 
expected or theoretical number of failures for each class is ob- 
tained. These figures appear in line 8. 

The deviations between the actual, or /i, and the theoretical, 
or /2 values are shown in lines 4 and 9. These deviations are 
squared and divided by the theoretical frequency, of their 
class. The final results appear in lines 6 and 11 and together 
equal 803.2, the total value. 


Since 


X' = 


tzJi 

/2 ’ 


, ^ (280)^ (54)^ (75)^ (259)^ (280)^ 

207 232 337 505 427 


( 54)2 ( 75)2 ( 259)2 

476 693 ^ 1038 


803.2 


The value of x^ computed for these data is 803.2. According to 
Table 37, Chi-Square Values, the value of x^ for 3 degrees of free- 
dom and p = .01 is 11,341. Since the value of x^ we obtained for 
college classes is so much larger, one must conclude that these 
four college classes do not form a homogeneous group. Their 
difference is too great to be accounted for by chance sampling. 
It is a real difference. Seniors and juniors are a more highly 
selected group than freshmen, and therefore are better adjusted 
to college work. 


SUMMARY 

1. A small sample may be arbitrarily designated as a sample contain- 
ing less than 30 items although the demarcation between small and large 
samples is a zone rather than a point. 

2. Small sample analyses are made necessary by the application of 
statistics to experimental laboratory sciences from which often only small 
samples can be obtained. 

3. The mean of a small sample tends to be more erratic or less depend- 
able than the mean of a larger sample. 

4. The standard deviation of a small sample tends to be too small, 
and, therefore, to give a standard error of estimate that is too small. 
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5. The statistics computed from small samples become dependable 
when adequate adjustments are made for their tendency to be too small 
The so called Hahle gives the ratios of dependability of small samples 
of various sizes. 

6. A statistical result is said to be significant when this result could 
have occurred only one time in twenty as the result of chance sampling. 

7. A statistical result is said to be highly significant when the result 
could have occurred only one time in one hundred as the result of chance 
sampling. 

8. The adjustments provided in the ^4able of ratios may be applied 
to the means, standard deviations, regression coefficients, coefficients of 
correlation, percentages, and the difference between means of small 
samples. But stable ratios can be applied to differences between only 
two statistics at one time. 

9. Chi-square is a statistical device which enables one to measure the 
significance of the difference among more than two statistics at one time. 
It is more extensive and flexible than the methods resting on the ^-table 
ratios. 

10. Chi-square is useful in comparing data with an established or ac- 
ceptable standard or theory, or comparing a given frequency distribution 
with the normal curve. 

11. Chi-square is useful in measuring the homogeneity of a sample such 
as a marketing area, a population group, or other social, political, or eco- 
nomic body. 
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REVIEW QUESTIONS 

1. What is a small sample? How many items must a sample contain 
before it ceases to be small? 

2. What are the characteristics of the means of small samples? Why? 
. 3. What are the characteristics of the standard deviations of small 

samples? Why? 

4. What are the characteristics of the standard errors of small samples? 
Why? 

5. What is the meaning and use of ^? 

6. How is t derived? 

7. What statistical measures and relations may be tested by i? 

8. What is Chi-square? 

9. What are the uses of and what are its limitations? 

10. Why are small samples advantageous? Are they better than large 
samples? 

11, In what fields are small samples most necessary? 


EXERCISES 


1. Population samples from six small frontier mining camps 

Camps 

1 2 3 4 5 6 

Males 21 12 32 17 46 29 

Females 6 7 12 5 27 22 

Compute x^- 

2. Data on second generation of mixed white and red peas on eight 
vines 


1 2 3 4 5 6 7 8 


Colored 29 32 

White 7 7 

Compute x^- 

Class Intervals / 


4- 9.9 

2 

10-15.9 

5 

16-21.9 

4 

22-27.9 

18 

28-33.9 

7 

3^39.9 

12 

40-45.9 

1 


22 41 19 31 37 40 

6 11 5 8 9 10 


Compute test of abnormality 
from normal curve. 



498 


THE ANALYSIS OF SMALL SAMPLES 


4. First sample — 22, ax, = 4. F = 37 

Second sample X 2 = 17, = 3. N — 50 

Compute <Tj^ and T. 


5. Xi Z 2 

7 12 

8 10 

5 9 

4 5 

6 9 

7 11 

5 8 

8 11 

9 12 

6 8 

6 . Zi Yi X2 Y2 

4 7 7 12 

6 8 5 9 

3 4 6 9 

7 9 5 8 

5 6 7 



Sl-2 


Compute, t = 



^ hx — 62 


11 


and 




CHAPTER 21 


THE ANALYSIS OF VARIANCE 


In Chapter 10, the theory and methods of computing the stand- 
ard deviation were developed and explained. In later chapters 
we have seen that statisticians make wide use of the standard 
deviation in measures of correlation, error, variation, and re- 
gression. We shall now modify this measure for other uses; 
The standard deviation is the square root of the sum of the 
squared deviations between the mean and the individual items 
divided by the number of items. As a measure of variation and 
deviation, better results for many purposes may be obtained by 
using the square of the standard deviation. 


<TX 



— Standard deviation 


Formula No. 85 
(TX^ = = F = Variance 

Variance is the name given the square of the standard devia- 
tion by R. A. Fisher, who developed its theory and uses. Since 
the squared deviations from the mean are always positive in 
sign and may be easily manipulated mathematically and have 
come into general use as a measure of deviation and variation, it 
seems logical to retain them in their squared form throughout the 
comparisons and computations. F, or is as easy to manipulate 
as a and, as we shall see, has certain advantages of simplicity. 
In Chapters 14 and 20 we learned methods of determining 
whether two samples had been taken from the same parent popu- 
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lation, or were from separate populations. One of these measures 
was the test for significance in ^-tables. Another method was 
that of Chi-Square. With these methods in mind, the student is 
prepared to consider a more adequate and complex method which 
will measure the significance of the difference between several 
means at once. This is the analysis of variance. 

DETAILED EXPLANATION OF VABIANCE 


The purpose of the following illustration is to hold the analysis 
of variance, as it were, under a microscope, so that the student 
may see clearly every detail. 



Fig. 91. Illustrathig divergence between samples in analysis of variance 

Assuming that we wish to determine whether two samples of 
corn are taken from tw^o separate populations or are in reality 
only two samples from the same parent population, we must 
measure the divergence of their means in relation to their total 
deviation from a common mean. The distribution A extends from 
10 to 15, and the distribution Bi from 11 to 16 on the X-scale. 
If B also extended from 10 to 15, its mean and range would be 
identical with the mean and range of A. Under this condition it 
would be evident that the two samples were identical and were, 
therefore, taken from the same parent population. The null 
hypothesis would be sustained. Since Bi is not identical with A, 
but is one unit to the right, it might be that this increase in size 
of Bi is sufficient to prove that it comes from a separate popula- 
tion. However, if the difference between A and Bi is too small to 
be significant, it is certain that if B were moved farther to the 
right, to the position of B^, it would be so different from A 
that the difference would be significant. If we continued to move 
B farther still to the right, it would finally reach the position of Bn 
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where the difference between A and Bn would clearly be sig- 
nificant. The purpose of the analysis of variance is not to de- 
termine the exact 'point on X where the difference between A and 
B is significant, but to determine whether the difference is signifi- 
cant at the point where B is located. 


WORKSHEET NO. 99 

1st Sample 2d Sample 2 Samples Combined 



Fig. 92. Showing combination of two samples in analysis of variance 


X - 

X = a: 


X - 

II 

10 ~ 

12 = 2 

4 

13 - 

15-2 

11 - 

12 = 1 

1 

14 - 

15 - 1 

11 - 

12 = 1 

1 

14 - 

15 = 1 

12 - 

12 = 


15 - 

15 - 

12 - 

12 = 


15 - 

15 = 

12 - 

12 = 


15 - 

15 = 

12 -- 

12 = 


15 - 

15 - 

13 - 

12 = 1 

1 

16 - 

15 - 1 

13 - 

12 = 1 

1 

16 - 

15 - 1 


4 10 - 13.5 = - 3.5 12.25 

11 - 13.5 = - 2.5 6.25 

1 11 - 13.5 = - 2.5 6.25 

12 - 13.5 = - 1.5 2.25 

12 - 13.5 = - 1.5 2.25 

1 12 - 13.5 = - 1.5 2.25 

12 - 13.5 = - 1.5 2.25 

13 - 13.5 = - 0.5 .25 

13 - 13.5 = - 0.5 .25 

13 - 13.5 = - 0.5 .25 

14 - 13.5 = + 0.5 .25 

14 - 13.5 = 4- 0.5 .25 

14 - 13.5 = + 0.5 .25 

15 - 13.5 = + 1.5 2.25 

2 25 

15 - 13.5 = -1- 1.5 2.25 

15 - 13.5 = 4- 1.5 2.25 

16 - 13.5 = 4- 2.5 6.25 

16 - 13.5 = -h 2.5 6.25 

17 - 13.5 = 4- 3.5 12.25 

20 |270 69.00 

_13.5 

= 13.5 


14 - 12 = 2 4 


1 
1 

17 - 15 = 2 4 


10|120 

12 


12 10|150 

15 


12 


15 - 13.5 = 4- 1.5 


Xi = 12 


X 2 = 15 
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(1) { The total sum of squares for the combined samples is 69.0 

(2) f (Xi+2 - = (13.5 - 12.0)2 - (^- 1.5)2 = 2.25 

(Zi+2 - X2)2 = (13.5 ~ 15.0) = ( 1 . 5)2 = 2.25 
2.25 X 10 (the number of items in 1st sample) 

= 22.5 = sum of squares between 1st sample mean and total group 
mean. 

< 2.25 X 10 (the number of items in 2nd sample) 

= 22.5 = sum of squares between 2nd sample mean and total group 
mean. 

22.5 + 22.5 == 45.0 = total of squared difference between the two 
sample means and the total group mean. 

^ Sum of squares between samples = 45.0 

(3) f Total X2 (squared deviations within 1st sample) = 12 
\ Total X2 (squared deviations within 2nd sample) = 12 

[ Total sum of squares of deviations within the two samples = 24.0 

Total Sum of Squares of combined samples = 69 
Sum of Squares between samples = 45 

Sum of Squares within samples == 24 


df 

Sum of Squares 

Mean Squares 

Total 19 

69 


Between 1 

45 

45.0 

Within 18 

24 

1.333 

^ Variance between group averages 

-338 

Variance within groups 

" 1.333 


The total dj is 19 because the combined samples contained 20 
items (20 — 1) = 19. The degree of freedom between samples is 
1 because there are 2 samples and (2 — 1) = 1. 

The remaining 18 degrees, the difference between 19 and 1, or 
18, are allotted to the individual items within the two samples. 
The proof that 18 is the correct figure is that each sample has 
10 items, and 

Samples Total df 

A 10 (iV -- 1) = 9 

B 10 (X - 1) = 

18 
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We, perhaps, now have sufficient insight into the methods of 
the analysis of variance to undertake the solution of a practical 
problem. It will be recalled that in Chapter 20 we questioned 
for the first time our original assumption that the heights of 
children in the first six grades were a single homogeneous popu- 
lation. Perhaps we are dealing with six populations instead of 
one. Let us now test this assumption more completely. We 
shall set up first the null hypothesis that there are no significant 
differences among the heights of children in the first six school 
grades. Second, from our school data we shall draw a small 
random sample of 8 items for each of the six grades and proceed 
to test the hypothesis of no significant differences in height. 


WORKSHEET NO. 100 

Analysis of Vaeiance of Heights of 48 Chil’dren Distributed 
Equally Among the First Six Grades of Stillwater Schools 


Child 

1 

2 

Grades 

3 4 

5 

6 

Totals 

1 

46 

47 

51 

51 

56 

60 

311 

2 

48 

50 

54 

55 

57 

59 

323 

3 

46 

47 

51 

52 

59 

60 

315 

4 

43 

48 

48 

57 

56 

60 

312 

5 

44 

50 

49 

54 

54 

61 

312 

6 

1 50 

49 

55 

51 

59 

59 

323 

7 

45 

48 

52 

53 

56 

59 

313 

8 

41 

49 

53 

54 

54 

56 

307 

Totals 

363 

388 

413 

427 

451 

474 

2,516 


1st Step. Computation of Total Variance for 6 Grades 


Formula No. 86 


2^2 ^ 272 _ 


( 27)2 

N 


SF2 = ( 46)2 _|_ (43)2 (40)2 + (43)2 + . . . + (55)2 = 133^133 

27 = 2,516 


^7)2 _ (2,516)2 _ 6,330,256 
N 48 48 


131,880.33 = correction 


2j/2 = 133,188 - 131,880.33 = 1,307.67 
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Bd Step. Computation of Variance between the means of Grades 

The sum of (Grade Totals)^ = (363)2 4. (333)2 + . . . + (474)2 

= 1,063,288. 

This total is divided by (/), 8, the number of items in each grade 
sample, in order to give the squares of the class means, each 
weighted by the number of observations upon which it is based. 


Formula No. 87 

2[(27i)2 + (SF2)2 + • . ♦ + (Sy.)2] (27)2 


In which 


(27i), (272), (27n) - Class totals 
(2 7) = Grand total of all classes 
2^c^ = Sum of squared deviations of class means from 
common mean 

/ = Number of items in a class 


1,063,288 

8 


132,911 


132,911 — 131,880.33 = 1,030.67 = Total squared deviations be- 
tween classes. 


Computation of F, Test of Significance 
OF Differences Among Grade Means 



Degrees of 
Freedom 

Sum of 
Squares 

Mean 

Squares 

Total 

Between Grades 

47 

5 

1.307.67 

1.030.67 

206.16 

Within Grades 

42 

277.00 

6.6 


„ 206.16 

6.6 

31.24 



This worksheet shows the analysis of variance in its simplest 
form which is very easy for even the beginning student to com- 
pute. In Table 38 we may locate the F-value for 42 and 5 degrees 





SUMMARY OF BASIC RELATIONS ^ 

of freedom and compare it with ouri^ of 31.24, which proves to 
be liighl}^ significant. 

The method of locating the F~value desired in Table 38 is to 
look across the top of the table along n, till one arrives at the 
column of figures direct^ under the number of degrees of free- 
dom in the between classes variable (N — 1). Then follow down 
that column of values until one is opposite the number in the 
left-hand column of n\ which is the number of degrees of freedom 
in the within classes variables. For our problem of classes of 
school children, ni = 5; that is, the fifth column on the top of 
Table 38, n^ = 42, which is located on the left-hand side of the 
table. The top figure in light type, 2.44, is the 5% level of sig- 
nificance. The lower bold-face type figure of 3.49 is the 1% level 
of highly significant value. Our F-value of 31.24 is more than 
nine times this highly significant value. This proves that these 
6 samples are taken from 6 separate populations. We must con- 
clude, therefore, that as far as height is concerned, each grade of 
the 6 grades of the common schools is a separate universe. The 
amount of height a child gains in a year definitely separates him 
in height from the child a year younger or a year older. From the 
standpoint of a completely accurate analysis of our sample, this 
discovery throws our data into an entirely new light. 

SUMMARY OF BASIC RELATIONS OF ANALYSIS 
OF VARIANCE 

A glance at the data in Worksheet No. 100, showing the heights 
of children for the six primary grades reveals that there is con- 
siderable variation in height within each class. The range of 
each of the six grades is 


Grade 

Range 

1 

41-50 

2 

47-50 

3 

48-55 

4 

51-57 

5 

54-59 

6 

56-61 



TABLE 38 

5% OR 95% IN Light-Face Type, 

ni degrees of freedom 



rh 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 


1 

161 

4,052 

200 

4,999 

216 

5,403 

225 

5,625 

230 

5,764 

234 

5,859 

237 

5,928 

239 

5,981 

241 

6,022 

242 

6,056 

243 

6,082 

244 

6,106 


2 

18 51 

98.49 

19 00 
99.01 

19.16 

99.17 

19 25 

99.26 

19 30 

99.30 

19 33 

99.33 

19 36 

99.34 

19 37 

99.36 

19 38 

99.38 

19 39 

99.40 

19 40 

99.41 

19.41 

99.42 


3 

10 13 

34.12 

9.55 

30.81 

9 28 

29.46 

9 12 

28.71 

9 01 

28.24 

8.94 

27.91 

8 88 

27.67 

8 84 

27.49 

8.81 

27.34 

8 78 

27.23 

8.76 

27.13 

8.74 

27.05 


4 

7 71 
21.20 

6.94 

18.00 

6.59 

16.69 

6 39 

16.98 

6 26 
15.52 

6 16 
15.21 

6 09 

14.98 

6.04 

14.80 

6 00 

14.66 

5 96 

14.54 

5 93 

14.45 

5.91 

14.37 


5 

6 61 
16.26 

5 79 

13.27 

5.41 

12.06 

519 

11.39 

5 05 

10.97 

4.95 

10.67 

4 88 

10.45 

4 82 

10.27 

4.78 

10.15 

4.74 

10.05 

4 70 

9.96 

4.68 

9.89 


6 

5.99 

13.74 

514 

10.92 

4 76 

9.78 

4.53 

9.15 

4 39 

8.76 

4 28 

8.47 

4 21 

8.26 

415 

8.10 

4 10 

7.98 

4 06 

7.87 

4.03 

7.79 

4 00 

7.72 

D 

e 


5.59 

12.25 

4.74 

9.55 

4 35 

8.45 

4 12 

7.85 

3 97 

7.46 

3.87 

7.19 

3 79 

7.00 

3 73 

6.84 

3 68 

6.71 

3 63 

6.62 

3.60 

6.54 

3.57 

6.47 

g 

r 

e 

8 

5 32 

11.26 

4 46 

8.65 

4 07 

7.59 

384 

7.01 

3 69 

6.63 

3 58 

6.37 

3 50 

6.19 

3 44 

6.03 

3 39 

5.91 

3 34 

5.82 

3 31 

5.74 

3.28 

5.67 

e 

s 

9 

5.12 

10.56 

4.26 

8.02 

3.86 

6.99 

3 63 

6.42 

3 48 

6.06 

3 37 

5.80 

3 29 

5.62 

3 23 

5.47 

3 18 

5.35 

3.13 

5.26 

3 10 
5.18 

3.07 

5.11 

f 

10 

4.96 

10.04 

410 

7.56 

3 71 

6.55 

3.48 

5.99 

3 33 
5.64 

3 22 
5.39 

3 14 
5.21 

3 07 

5.06 

3 02 

4.95 

2.97 

4.85 

2 94 

4.78 

2.91 

4.71 

r 

e 

11 

4.84 

9.65 

3 98 

7.20 

3.59 

6.22 

3 36 

5.67 

3 20 
5.32 

3.09 

5.07 

3 01 

4.88 

2 95 

4.74 

2 90 

4.63 

2.86 

4.54 

2.82 

4.46 

2 79 

4.40 

e 

d 

0 

12 

4.75 

9.33 

3 88 

6.93 

3 49 
5.95 

3 26 

5.41 

3.11 

5.06 

300 

4.82 

2 92 

4.65 

2 85 

4.50 

2 80 

4.39 

2.76 

4.30 

2 72 

4.22 

2 69 

4.16 

m 

13 

4.67 

9.07 

3 80 

6.70 

3 41 

5.74 

3 18 
5.20 

3 02 
4.S6 

2 92 

4.62 

2.84 

4.44 

2 77 

4.30 

2.72 

4.19 

2 67 

4.10 

2 63 

4.02 

2 60 

3.96 

1 

6 

14 

4 60 

8.86 

3.74 

6.51 

3 34 

5.56 

311 

5.03 

2 96 
4.69 

2.85 

4.46 

2.77 

4.28 

2 70 

4.14 

2 65 

4.03 

2 60 

3.94 

2 56 

3.86 

2 63 

3.80 

S 

s 

15 

4 54 

8.68 

3.68 

6.36 

3.29 

5.42 

306 

4.89 

2 90 

4.56 

2.79 

4.32 

2.70 

4.14 

2 64 

4.00 

2 59 

3.89 

2 55 

3.80 

2.51 

3.73 

2 48 

3.67 

e 

r 

16 

4 49 

8.63 

3.63 

6.23 

3 24 

5.29 

3.01 

4.77 

2 86 

4.44 

2 74 

4.20 

2.66 

4.03 

2 59 

3.89 

2 54 

3.78 

2 49 

3.69 

2.45 

3.61 

2.42 

3.55 

V 

17 

4 45 

8.40 

3.59 

6.11 

3 20 

5.18 

2 96 

4.67 

2.81 

4.34 

2 70 

4.10 

2 62 

3.93 

2.55 

3.79 

2.50 

3.68 

2 45 

3.59 

2 41 
3:52 

2 38 

3.45 

a 

r 

i 

18 

4 41 
8.28 

3.55 

6.01 

3.16 

5.09 

2 93 

4.58 

2 77 

4.25 

2.66 

4.01 

2 58 

3.85 

2.51 

3.71 

2 46 

3.60 

2.41 

3.51 

2 37 

3.44 

2 34 

3.37 

a 

n 

19 

4.38 

8.18 

3.52 

5.93 

3.13 

5.01 

2.90 

4.50 

2 74 

4.17 

2 63 

3.94 

2 56 

3.77 

2.48 

3.63 

2 43 

3.52 

2 38 

3.43 

2 34 

3.36 

2 31 
3.30 

c 

e 

20 

4 35 

8.10 

3.49 

5.85 

3.10 

4.94 

2 87 

4.43 

2.71 

4.10 

2 60 

3.87 

2 52 

3.71 

2.45 

3.56 

2 40 

3.45 

2.35 

3.37 

2 31 
3.30 

2 28 

3.23 


21 

4.32 

8.02 

3 47 

6.78 

3.07 

4.87 

2 84 

4.37 

2 68 
4.04 

2.57 

3.81 

2 49 

3.65 

2 42 

3.51 

2.37 

3.40 

2 32 
3.31 

2 28 

3.24 

2 25 

3.17 


22 

4.30 

7.94 

3.44 

5.72 

3.05 

4.82 

2.82 

4.31 

2.66 

3.99 

2.55 

3,76 

2.47 

3.59 

2.40 

3.45 

2.35 

3.35 

2 30 

3.26 

2 26 

3.18 

2.23 

3.12 


23 

4 28 

7.88 

3.42 

5.66 

3 03 

4.76 

2.80 

4.26 

2.64 

3.94 

2 53 

3.71 

2 45 

3.54 

2 38 

3.41 

2.32 

3.30 

2 28 

3.21 

2.24 

3.14 

2 20 

3.07 


24 

4 26 

7.82 

3 40 

5.61 

3.01 

4.72 

2.78 

4.22 

2 62 

3.90 

2.51 

3.67 

2 43 

3.50 

2 36 

3.36 

2 30 

3.25 

2.26 

3.17 

2.22 

3.09 

2.18 

3.03 


25 

4 24 

7.77 

3 38 

5.57 

2 99 

4.68 

2.76 

4.18 

2 60 

3.86 

2 49 

3.63 

2 41 
3.46 

2 34 

3.32 

2.28 

3.21 

2 24 

3.13 

2 20 

3.05 

2 16 

2.99 


26 

4 22 

7.72 

3.37 

5.53 

2 98 

4.64 

2 74 
4.14 

2 59 

3.82 

2 47 

3.59 

2.39 

3.42 

2.32 

3.29 

2.27 

3.17 

2.22 

3.09 

218 

3.02 

2.15 

2.96 


506 



VALUES OF 

1% OR 99% IN Bold-Face T 3 rpe 
for greater variance 


14 

16 

20 

24 

30 

40 

50 

75 

100 

200 

500 

00 

^2 

245 

6,142 

246 

6,169 

248 

6,208 

249 

6,234 

250 

6,258 

251 

6,286 

252 

6,302 

263 

6,323 

253 

6,334 

254 

6,352 

254 

6,361 

254 

6,366 

1 

19 42 

99.43 

19 43 

99.44 

19 44 

99.45 

19 45 

99.46 

19 46 

99.47 

19 47 

99.48 

19 47 
99.48 

19 48 

99.49 

19.49 

99.49 

19 49 

99.49 

19 50 

99.50 

19 50 

99.50 

2 

8 71 

26.92 

8 69 

26.83 

8.66 

26.69 

8 64 

26.60 

8 62 

26.50 

8 60 

26.41 

8 58 

26.35 

8 57 

26.27 

8 56 

26.23 

8 54 

26.18 

8 54 

26.14 

8 53 

26.12 

3 

5.87 

14.24 

5.84 

14.15 

5 80 
14.02 

5 77 
13.93 

5.74 

13.83 

6 71 
13.74 

5 70 

13.69 

5 68 
13.61 

5.66 

13.57 

5 65 

13.52 

5 64 

13.48 

5 63 

13.46 

4 

4 64 

9.77 

4 60 

9.68 

4 56 
9.55 

4.53 

9.47 

4 60 

9.38 

4.46 

9.29 

4 44 

9.24 

4 42 

9.17 

4.40 

9.13 

4.38 

9.07 

4 37 

9.04 

4 36 

9.02 

5 

3 96 

7.60 

3.92 

7.52 

3.87 

7.39 

384 

7.31 

3 81 

7.23 

3 77 

7.14 

3 75 

7.09 

3 72 

7.02 

3 71 

6.99 

3 69 

6.94 

3.68 

6.90 

3 67 

6.88 

6 

3 52 

6.35 

3,49 

6.27 

3 44 

6.15 

3 41 

6.07 

3 38 

5.98 

3 34 

5.90 

3 32 

5.85 

3 29 

5.78 

3 28 

5.75 

3.25 

5.70 

3 24 

5.67 

3.23 

5.65 

7 

3 23 

5.56 

3,20 

5.48 

3.15 

5.36 

3 12 

5.28 

3 08 

5.20 

3 05 
5.11 

3.03 

5.06 

3 00 

5.00 

2 98 

4.96 

2 96 

4.91 

2.94 

4.88 

2.93 

4.86 

8 

3 02 

5.00 

2 98 

4.92 

2 93 

4.80 

2.90 

4.73 

2 86 

4.64 

2 82 

4.56 

2 80 
4.51 

2 77 

4.45 

2 76 

4.41 

2.73 

4.36 

2 72 

4.33 

2 71 

4.31 

9 

2 86 

4.60 

2 82 

4.52 

2 77 

4.41 

2 74 

4.33 

2 70 

4.25 

2.67 

4.17 

2 64 
4.12 

2.61 

4.05 

2 59 

4.01 

2 56 

3.96 

2 55 

3.93 

2.54 

3.91 

10 

2.74 

4,29 

2.70 

4.21 

2 65 

4.10 

2.61 

4.02 

2 57 

3.94 

2.53 

3.86 

2 50 

3.80 

2 47 

3.74 

2 45 

3.70 

2 42 

3.66 

2 41 

3.62 

2.40 

3.60 

11 

2 64 

4.05 

2 60 

3.98 

2 54 

3.86 

250 

3.78 

2 46 

3.70 

2.42 

3.61 

2 40 
3.56 

2 36 

3.49 

2 35 

3,46 

2 32 

3.41 

2 31 

3.38 

2 30 

3.36 

12 

2 55 

3.85 

2 51 
3.78 

2 46 

3.67 

2 42 

3.59 

2 38 

3.51 

2.34 

3.42 

2 32 

3.37 

2 28 

3.30 

2 26 

3.27 

2 24 

3.21 

2 22 

3.18 

2 21 

3.16 

13 

2 48 

3.70 

2 44 

3.62 

2 39 

3.51 

2 35 

3.43 

2 31 

3.34 

2.27 

3.26 

2 24 
3.21 

2,21 

3.14 

2 19 

3.11 

2 16 
3.06 

2 14 

3.02 

2 13 

3.00 

14 

2 43 

3.56 

2 39 

3.48 

2 33 

3.36 

2 29 

3.29 

2 25 

3.20 

2.21 

3.12 

2 18 
3.07 

2.15 

3.00 

2 12 

2.97 

2 10 

2.92 

2 08 

2.89 

2 07 

2.87 

15 

2 37 

3.45 

2 33 

3.37 

2 28 

3.25 

2 24 

3.18 

2 20 

3.10 

2 16 
3.01 

2 13 

2.96 

2 09 

2.89 

2 07 

2.86 

204 

2.80 

2 02 

2.77 

2 01 

2.75 

16 

2 33 

3.35 

2 29 

3.27 

2 23 

3.16 

2.19 

3.08 

2.15 

3.00 

2 11 
2.92 

2 08 

2.86 

2 04 

2.79 

2 02 

2.76 

199 

2.70 

1 97 

2.67 

1 96 

2.65 

17 

2 29 

3.27 

2 26 

3.19 

2.19 

3.07 

2 15 

3.00 

2 11 

2.91 

2 07 

2.83 

204 

2.78 

2 00 

2.71 

198 

2.68 

195 

2.62 

1.93 

2.59 

1.92 

2.57 

18 

2 26 

3.19 

2 21 

3.12 

2.15 

3.00 

211 

2.92 

2 07 

2.84 

2 02 

2.76 

200 

2.70 

196 

2.63 

194 

2.60 

191 

2.54 

1 90 

2.51 

1 88 

2.49 

19 

2.23 

3.13 

2 18 

3.05 

2 12 

2.94 

2 08 

2.86 

2 04 

2,77 

1.99 

2.69 

1.96 

2.63 

192 

2.56 

1 90 

2.53 

1.87 

2.47 

1.85 

2.44 

184 

2.42 

20 

2.20 

3.07 

2.15 

2.99 

2.09 

2.88 

2 06 

2.80 

2.00 

2.72 

1 96 

2.63 

1 93 

2.58 

189 

2.51 

187 

2.47 

184 

2.42 

1 82 
2.38 

181 

2.36 

21 

2 18 

3.02 

2.13 

2.94 

2.07 

2.83 

2 03 

2.75 

198 

2.67 

1.93 

2.58 

191 

2.53 

187 

2.46 

1 84 

2.42 

181 

2.37 

1.80 

2.33 

1 78 

2.31 

22 

2 14 

2.97 

2 10 

2.89 

2.04 

2.78 

2.00 

2.70 

1.96 

2.62 

191 

2.53 

188 

2.48 

184 

2.41 

182 

2.37 

179 

2.32 

1.77 

2.28 

176 

2.26 

23 

2 13 

2.93 

2 09 

2.85 

2 02 

2.74 

1 98 

2.66 

1.94 

2.58 

1 89 

2.49 

186 

2.44 

182 

2.36 

180 

2.33 

176 

2.27 

1 74 
2.23 

173 

2.21 

24 

2 11 

2.89 

2 06 

2.81 

2.00 

2.70 

1 96 
2.62 

192 

2.54 

187 

2.45 

184 

2.40 

1.80 

2.32 

1.77 

2.29 

1.74 

2.23 

1 72 

2.19 

1 71 
2.17 

25 

2 10 
2.86 

2 06 

2.77 

199 

2.66 

1 96 

2.58 

1 90 
2.50 

185 

2.41 

1.82 

2.36 

178 

2.28 

1.76 

2.25 

1 72 
2.19 

1 70 
2.15 

1 69 

2.13 

26 


1 Snedecor. G W , Statistical Methods, pp 184-187, 1940. Reprinted by permission 
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TABLE 38 

5 % OK 95% IN Light-Face Type, 
ni degrees of freedom 



712 

1 

2 

3 

4 

5 

6 

7 

8 

9 


11 

12 


27 

4,21 

7.68 

3.35 

5.49 

2.96 

4.60 

2.73 

4.11 

2.57 

3.79 

2 46 

3.56 

2.37 

3.39 

2 30 

3.26 

2.25 

3.14 

2.20 

3.06 

2 16 

2.98 

2.13 

2.93 


28 

4.20 

7.64 

3.34 

5.45 

2 95 

4.57 

2.71 

4.07 

2 56 

3.76 

2.44 

3.53 

2.36 

3.36 

2 29 

3.23 

2 24 

3.11 

2.19 

3.03 

2.15 

2.95 

2 12 

2.90 


29 

4.18 

7.60 

3..33 

5.42 

2.93 

4.54 

2.70 

4.04 

254 

3.73 

2 43 

3.50 

2 35 

3.33 

2 28 

3.20 

2.22 

3.08 

2 18 

3.00 

2 14 

2.92 

2 10 

2.87 


30 

4 17 

7.56 

3 32 

5.39 

2.92 

4.51 

2 69 

4.02 

2 53 

3.70 

2 42 

3.47 

2 34 

3.30 

2 27 

3.17 

2 21 

3.06 

2 16 

2.98 

2 12 

2.90 

2 09 

2.84 



4 15 

7.50 

3 30 

5.34 

2 90 

4.46 

2.67 

3.97 

2 51 

3.66 

2.40 

3.42 

2 32 

3.25 

2 25 

3.12 

2.19 

3.01 

2 14 

2.94 

2 10 

2.86 

2 07 

2.80 


34 

4 13 

7.44 

3 28 

5.29 

2.88 

4.42 

2 65 
3.93 

2.49 

3.61 

2 38 

3.38 

2 30 

3.21 

2 23 

3.08 

2 17 

2.97 

2 12 

2.89 

2 08 

2.82 

2.05 

2.76 

D 

e 

36 

4 11 
7.39 

3 26 

5.25 

2.86 

4.38 

2 63 

3.89 

2 48 

3.58 

2 36 

3.35 

2 28 

3.18 

2 21 

3.04 

2.15 

2.94 

2 10 

2.86 

2 06 

2.78 

2.03 

2.72 

g 

r 


4.10 

7.35 

3 25 

5.21 

2 85 

4.34 

2.62 

3.86 

2 46 

3.54 

2 35 

3.32 

2 26 

3.15 

2 19 

3.02 

2.14 

2.91 

2 09 

2.82 

2 05 

2.75 

2 02 

2.69 

e 

s 

40 

4 08 

7.31 

3.23 

5.18 

2 84 

4.31 

2 61 
3.83 

2 45 

3.51 

2 34 

3.29 

2.25 

3.12 

2 18 

2.99 

2 12 

2.88 

2 07 

2.80 

2 04 

2.73 

2 00 

2.66 

f 

r 

e 

42 

4 07 

7.27 

3.22 

5.15 

2.83 

4.29 

2.59 

3.80 

244 

3.49 

2 32 

3.26 

2.24 

3.10 

2.17 

2.96 

2 11 

2.86 

2 06 

2.77 

2 02 

2.70 

199 

2.64 

44 

4 06 

7.24 

3 21 
5.12 

2 82 

4.26 

2.58 

3.78 

2 43 

3.46 

2 31 

3.24 

2 23 

3.07 

216 

2.94 

2 10 

2.84 

2 05 

2.75 

2 01 

2.68 

198 

2.62 

e 

d 

46 

4 05 
7.21 

3.20 

5.10 

2 81 

4.24 

2 57 

3.76 

2.42 

3.44 

2 30 
3.22 

2 22 

3.05 

2 14 

2.92 

2 09 

2.82 

2.04 

2.73 

2 00 

2.66 

197 

2.60 

m 

48 

4 04 

7.19 

3.19 

5.08 

2 80 

4.22 

2 56 

3.74 

2.41 

3.42 

2 30 

3.20 

2 21 

3.04 

2.14 

2.90 

2.08 

2.80 

2.03 

2.71 

1 99 

2.64 

1.96 

2.58 

1 

60 

4 03 

7.17 

3 18 

5.06 

2 79 

4.20 

2.56 

3.72 

2.40 

3.41 

2 29 

3.18 

2 20 

3.02 

2 13 

2.88 

2 07 

2.78 

2.02 

2.70 

1 98 

2.62 

195 

2.56 

s 

s 

65 

4 02 

7.12 

3 17 

5.01 

2 78 

4.16 

254 

3.68 

2 38 

3.37 

2 27 

3.15 

2 18 

2.98 

2 11 

2.85 

2.05 

2.75 

2.00 

2.66 

197 

2.59 

1 93 

2.53 

e 

r 

60 

400 

7.08 

3 15 

4.98 

2 76 

4.13 

2.52 

3.65 

2.37 

3.34 

2 25 

3.12 

2.17 

2.95 

2 10 

2.82 

2.04 

2.72 

1.99 

2.63 

1 95 

2.56 

1 92 

2.50 

V 

65 

3 99 

7.04 

3 14 

4.95 

2.75 

4.10 

2 51 
3.62 

2 36 

3.31 

2 24 

3.09 

2.15 

2.93 

2 08 

2.79 

2 02 

2.70 

198 

2.61 

194 

2.54 

1 90 

2.47 

a 

r 

70 

3 98 

7.01 

3 13 

4.92 

2 74 

4.08 

2 50 

3.60 

2 35 

3.29 

2.23 

3.07 

2.14 

2.91 

2 07 

2.77 

2 01 

2.67 

197 

2.59 

1 93 

2.51 

189 

2.45 

a 

n 

80 

3 96 

6.96 

3 11 

4.88 

2 72 

4.04 

2 48 

3.56 

2 33 

3.25 

2.21 

3.04 

2.12 

2.87 

2 05 

2.74 

199 

2.64 

195 

2.55 

191 

2.48 

1.88 

2.41 

c 

e 


3 94 

6.90 

3 09 

4.82 

2.70 

3.98 

2 46 

3-51 

2 30 
3.20 

2.19 

2.99 

2.10. 

2.82 

2 03 

2.69 

1 97 
2.59 

192 

2.51 

188 

2.43 

1 85 

2.36 


126 

3 92 

6.84 

3 07 

4.78 

2 68 

3.94 

244 

3.47 

2 29 

3.17 

2 17 

2.95 

2.08 

2.79 

2.01 

2.65 

1 95 

2.56 

1.90 

2.47 

186 

2.40 

1 83 

2.33 


150 

3 91 

6.81 

3 06 

4.75 

2.67 

3.91 

2 43 

3.44 

2 27 

3.14 

2 16 

2.92 

2.07 

2.76 

2.00 

2.62 

1.94 

2.53 

189 

2.44 

185 

2.37 

1 82 

2.30 


200 

3 89 

6.76 

3 04 
4.71 

2.65 

3.88 

2.41 

3.41 

2 26 

3.11 

2 14 

2.90 

2 05 

2.73 

1.98 

2.60 

1.92 

2.50 

187 

2.41 

183 

2.34 

1.80 

2.28 


400 

3 86 

6.70 

3 02 

4.66 

2 62 

3.83 

2 39 

3.36 

2 23 

3.06 

2 12 

2.85 

2.03 

2.69 

1.96 

2.55 

1.90 

2.46 

1 85 

2.37 

1.81 

2.29 

1 78 
2.23 



3 85 

6.66 

3 00 
4.62 

2 61 

3.C0 

2 38 

3.34 

2,22 

3.04 

2.10 

2.82 

2 02 

2.66 

1 95 

2.53 

1.89 

2.43 

1 84 

2.34 

1.80 

2.26 

1 76 

2.20 


00 

3.84 

6.64 

2 99 

4.60 » 

2 60 

3.78 

2.37 

3.32 

2,21 

3.02 

2 09 

2.80 

2 01 

2.64 

1 94 

2.51 

1.88 

2.41 

1.83 

2.32 

1.79 

2.24 

175 

2.18 
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VALUES OF (Continued) 
1% OK 99% IN Bold-Face Type 
for greater variance 


14 

16 

20 

24 

30 

40 

50 

75 

100 

200 

500 

00 

rh 


2 08 

2 03 

1 97 

193 

1.88 

1.84 

1.80 

1 76 

1 74 

1 71 

1.68 

1 67 

27 


2.83 

2.74 

2.63 

2.55 

2.47 

2.38 

2.33 

2.25 

2.21 

2.16 

2.12 

2.10 



2 06 

2 02 

1 96 

191 

1 87 

181 

1 78 

175 

1 72 

1 69 

1 67 

1 65 

28 


2.80 

2.71 

2.60 

2.52 

2.44 

2.35 

2.30 

2.22 

2.18 

2.13 

2.09 

2.06 



2 05 

2 00 

1 94 

1 90 

185 

180 

1 77 

1 73 

1 71 

168 

165 

164 

29 


2.77 

2.68 

2.57 

2.49 

2.41 

2.32 

2.27 

2.19 

2.15 

2.10 

2.06 

2.03 



2 04 

1 99 

193 

1 89 

1.84 

1 79 

176 

172 

1.69 

1 66 

164 

162 

30 


2.74 

2.66 

2.55 

2.47 

2.38 

2.29 

2.24 

2.16 

2.13 

2.07 

2.03 

2.01 



2 02 

1 97 

1.91 

1 86 

1.82 

176 

1 74 

1 69 

1 67 

1 64 

1.61 

1 59 

32 


2.70 

2.62 

2.51 

2.42 

2.34 

2.25 

2.20 

2.12 

2.08 

2.02 

1.98 

1.96 



2 00 

1 95 

1 89 

184 

1 80 

1 74 

1 71 

1.67 

1 64 

1 61 

1 59 

1 57 

34 


2.66 

2.58 

2.47 

2.38 

2.30 

2.21 

2.15 

2.08 

2.04 

1.98 

1.94 

1.91 



1 98 

1.93 

1 87 

1 82 

1.78 

172 

1 69 

1 65 

1 62 

1 59 

1 56 

1 55 

36 

D 

2.62 

2.54 

2.43 

2.35 

2.26 

2.17 

2.12 

2.04 

2.00 

1.94 

1.90 

1.87 


e 

196 

1 92 

1.85 

1 80 

1.76 

171 

1 67 

1.63 

160 

1.57 

1 54 

1 63 

38 


2.59 

2.51 

2.40 

2.32 

2.22 

2.14 

2.08 

2.00 

1.97 

1,90 

1.86 

1.84 

r 

195 

190 

1.84 

1 79 

1 74 

169 

166 

161 

1.59 

1.55 

153 

1 51 

40 

e 

2.56 

2.49 

2.37 

2.29 

2.20 

2.11 

2.05 

1.97 

1.94 

1.88 

1.84 

1.81 

s 

1 94 

1.89 

1 82 

1.78 

1 73 

168 

1.64 

1 60 

1 57 

1.54 

1 51 

1 49 

42 


2.54 

2.46 

2.35 

2.26 

2.17 

2.08 

2.02 

1.94 

1.91 

1.85 

1.80 

1.78 

f 

192 

188 

181 

1 76 

1 72 

166 

1.63 

1 58 

1.56 

1 52 

1 50 

1 48 

44 

r 

2.52 

2.44 

2.32 

2.24 

2.15 

2.06 

2.00 

1.92 

1.88 

1.82 

1.78 

1.75 

e 

191 

1 87 

1 80 

1.75 

1 71 

165 

1 62 

157 

1 54 

1 51 

1 48 

1 46 

46 

1 ^ 

2.50 

2.42 

2.30 

2.22 

2.13 

2.04 

1.98 

1.90 

1.86 

1.80 

1.76 

1.72 

d 

1 90 

1 86 

1.79 

1 74 

1 70 

1 64 

161 

1 56 

1 53 

1 50 

1 47 

1.45 

48 

m 

2.48 

2.40 

2.28 

2.20 

2.11 

2.02 

1.96 

1.88 

1.84 

1.78 

1.73 

1.70 

1 90 

1 85 

1 78 

1.74 

1 69 

163 

1 60 

1 55 

1 52 

1 48 

1.46 

1 44 

50 

1 

2.46 

2.39 

2.26 

2.18 

2.10 

2.00 

1.94 

1.86 

1.82 

1.76 

1.71 

1.68 

1.88 

1 83 

1 76 

1 72 

1.67 

161 

1 58 

1 52 

1.50 

1 46 

1 43 

1 41 

55 

e 

s 

2.43 

2.35 

2.23 

2.15 

2.06 

1.96 

1.90 

1.82 

1.78 

1.71 

1.66 

1.64 

s 

1 86 

1 81 

1 75 

1.70 

1.65 

1.59 

156 

1 50 

1.48 

1 44 

1.41 

1 39 

60 

e 

2.40 

2.32 

2.20 

2.12 

2.03 

1.93 

1.87 

1.79 

1.74 

1.68 

1.63 

1.60 

r 

1 85 

1 80 

1 73 

1.68 

1.63 

157 

1 54 

149 

146 

1 42 

1 39 

1 37 

65 


2.37 

2.30 

2.18 

2.09 

2.00 

1.90 

1.84 

1.76 

1.71 

1.64 

1.60 

1.56 

V 

1 84 

1 79 

1.72 

167 

1.62 

1 56 

153 

147 

1 45 

1 40 

1 37 

1 35 

70 

a 

2.35 

2.28 

2.15 

2.07 

1.98 

1.88 

1.82 

1.74 

1.69 

1.62 

1.56 

1.53 

r 

1 82 

1 77 

1 70 

1 65 

1 60 

1 54 

151 

145 

1 42 

1 38 

1 35 

1 32 

80 

i 

a 

2.32 

2.24 

2.11 

2.03 

1.94 

1.84 

1.78 

1.70 

1.65 

1.57 

1.52 

1.49 

n 

1 79 

1.75 

1.68 

1.63 

1.57 

1 51 

148 

142 

139 

1 34 

1 30 

1 28 

100 

c 

2.26 

2.19 

2.06 

1.98 

1.89 

1.79 

1.73 

1.64 

1.59 

1.51 

1.46 

1.43 

e 

1 77 

1 72 

1 65 

1 60 

1.55 

1 49 

145 

139 

1 36 

1 31 

1 27 

1 26 

125 


2.23 

2.15 

2.03 

1.94 

1.85 

1.75 

1.68 

1.59 

1.54 

1.46 

1.40 

1.37 


1 76 

171 

1 64 

1 59 

1 54 

1 47 

1 44 

137 

134 

1 29 

1.25 

1 22 

150 


2.20 

2.12 

2.00 

1.91 

1.83 

1.72 

1.66 

1.56 

1.51 

1.43 

1.37 

1.33 


1 74 

1 69 

1 62 

1 57 

1 52 

145 

1 42 

135 

132 

1 26 

1.22 

1 19 

200 


2.17 

2.09 

1.97 

1.88 

1.79 

1.69 

1.62 

1.53 

1.48 

1.39 

1.33 

1.28 


1 72 
2.12 

1 67 

2.04 

1 60 
1.92 

1 54 

1.84 

1 49 

1.74 

1.42 

1.64 

1 38 
1.57 

132 

1.47 

1 28 

1.42 

1 22 

1.32 

1 16 
1.24 

1 13 

1.19 

400 


1 70 
2.09 

1 65 

2.01 

1 58 

1.89 

1 53 

1.81 

1 47 
1.71 

141 

1.61 

1 36 
1.54 

130 

1.44 

1 26 

1.38 

1 19 
1.28 

1.13 

1.19 

1 08 
1.11 

1,000 


1 69 

2.07 

1 64 

1.99 

1.57 

1.87 

1 52 

1.79 

146 

1.69 

1 40 

1.59 

1 35 

1.52 

1 28 

1.41 

1 24 
1.36 

1 17 
1.25 

1 11 
1.15 

1 00 

1.00 

00 



1 Snedecor, G. W , Statistical Methods, pp 184-187, 1940. Reprinted by permission 
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This variation inside of each grade sample means that if all six 
grade samples were thrown into one large sample of 48 items, a 
large part of the total variation in that total sample would be 
due to the intraclass variations. It is also evident, however, that 
the average height of each succeeding class is much greater than 
that of the previous class. The total and average differences be- 
tween classes are: 


Grade 

Total 

Mean 

1 

363 

45.4 

2 

388 

48.5 

3 

413 

51.6 

4 

427 

53.4 

5 

451 

56.4 

6 

474 

59.2 


From these two analyses it is evident that part of the total 
variation is due to variation within the classes, and part of it is the 
result of variation between the classes. If there were no variation 
between the classes, all the variation would be within the several 
grades. If, on the other hand, there were no variation within 
any class, all the variation would be between the grades. If, for 
instance, all first-grade children were 45 inches tall, all second 
grade pupils were 48 inches tall, all third-grade pupils were 52 
inches tall, and all other grades followed the same pattern, all 
the variation would be between grades. But since both kinds of 
variation occur at the same time, the problem of variance is to 
determine (1) how much of the total is due to within class varia- 
tion, and (2) how much of the total is due to between class varia- 
tion. This computation is made by (1) assuming the null hypothe- 
sis of no difference among classes and then computing the total 
variance of all the data of all classes from their common mean, 
and then (2) computing the amount' of variation between the 
classes, and (3) subtracting the sum of between variation from the 
total variation to obtain the within variation. The last step is to 
divide the mean between variation by the mean within variation to 
obtain the ratio between the two portions of the total variance. 

F = "^^riance between group averages 
Variance within group 
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If this ratio, F, is larger than could logically be accounted for on 
the basis of chance sampling, it should be concluded that the sep- 
arate class samples are from different populations. If, on the 
other hand, this ratio, F, is so small that the differences among 
the classes can be logically accounted for on the bases of chance 
sampling, it should be concluded that the samples are not sig- 
nificantly different, but might have all been taken from the same 
population. 

The estimation of the significance of the difference between 
only two means at one time can be made by the methods developed 
in Chapters 14 and 20, and generally known as the Critical 
Ratio, but the analysis of variance enables one to make com- 
parisons among a larger number of means in one computation. 

As will be indicated in later examples, more than one type of 
between measurement may be made for the data at the same time. 
These refinements greatly increase the accuracy of the measure- 
ments of the several between types of variation. 


WORKSHEET NO. 101 

Summary of Cabbage Seed Treatment Tests 
IN Experiments in Six States 



Number of Seeds Emerging 


States 

Treatments 

Total 


Zno 

Sem. 

None 


Delaware 

57 

56 

46 


Florida 

73 

74 

52 


Louisiana 

62 

66 

40 


Maryland 

70 

68 

55 


New York 

79 

78 

72 


Virginia 

60 

67 

53 


Totals 

401 

409 

318 

1,128 

Means 

66.83 

68.17 

53.00 

62.67 


First Step. Computation of total sum of squares for six states 
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= (57)2 + (73)2 4. (02)2 + - . . + (53)2 = 72,766 

(SZ)2 (1,128)2 1,272,384 _ 

__ = __ _ __ _ 70,688 

2x2 = 72^766 _ 70,688 - 2,078 

Second Step. Computation of sum of squares between means. 

2C(2Xi)2 + (2X2)2 + (2X3)2] _ 160,801 + 167,281 + 101,124 

/ 6 

= — = 71,534.3 - 70,688 = 846.33 

In the first step all of the data are thrown together in one sam- 
ple on the theory of the null hypothesis that there is no significant 
difference in the results whether the cabbage seeds are treated or 
not. It is assumed that Just as many of the seeds will grow when 
they are not treated as when they are. On this assumption, the 
data are considered one sample and the squared deviations of all 
the items from their common mean are computed. Their sum is 
2,078. This is the total sum of squares in the data. This total is 
composed of two parts. One part of this total is the result of varia- 
tion within each treatment. For instance, in the treatment Zno, 
the data vary all the way from 57 to 79. In the treatment Sem., 
the variations are in the range of 56 to 78, and in the treatment 
None, the variation spreads from 46 to 72. It is evident that part 
of the total variance of 2,078 is due to this variation within the 
several treatments. But it is also evident that there is a consid- 
erable variation between the means of the three treatments which 
are 66.83 for Zno, 68.17 for Sem., 53.00 for None. The problem 
of the analysis of variance is to compute the mean amount of 
variance which is the result of within variation as compared with 
between variation. The ratio between the within mean squares and 
the between mean squares is the measure of significance. 

Method of computing the Critical Ratio, or the difference be- 
tween two means, was developed in Chapters 14 and 20 by means 
of the formula: 

, - Zs 
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This formula limits one to the comparison of only two means at a 
time. The analysis of variance provides a method of comparing 
the variation among a group of several means at one time, or by 
one computation. It also provides the means of splitting the 
total variance into two parts, (1) that variation which is the re- 
sult of scatter of items within classes, and (2) that variation which 
is due to divergence between the means of the several classes. 
The if-ratio, or the Critical Ratio, should be used when one wishes 
to make comparisons between only two classes at a time. The 
analysis of variance must be used to make an over-all comparison 
among a group of means, or comparisons among several means 
at once. The analysis of variance may also be used to compare only 
two means, but the Critical Ratio is limited to two at one time. 

The sum of the squared deviations within the three separate 
treatments computed from the treatment means may be verified 
as follows: 


Treatment 


Squared Deviations 
Within Treatments 


Zno 362.83 

Sem, 284.84 

None 584.00 


Total 1,231.67 


Computed by the method of variance, the total within squared 
deviations are also 1,231.67 as shown below. 


TABLE 39 

Computation of E, Test of Significance of Diffeeence 
Among Treatment Means 



Degrees of ^ 

Sum of 

Mean 


Freedom 

Squares 

Squares 

Total 

17 

2,078 


Between Treatments 

2 

846.33 

423.16 

Within Treatments 

15 

1,231.67 

82.11 


423.16 
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Whether the within-treatments variation is computed for each 
class (treatment) mean separately and then summed, or is com- 
puted by subtracting the between- variation from the total varia- 
tion, the results are identical. In this case, both are 1,231.67. 
This test is a check on the accuracy of the computations. 

In this case, the results are significant. In checking on Table 
38 — Values of F for 2 degrees of freedom (df) across the top of 
the table, and 15 degrees of freedom (df) down the left-hand side 
of the table, one finds for the 5% level of significance, 3.68 and 
for the 1% level of significance, 6.36. Since our F-value is 5.15, 
we conclude that our results are significant, but not quite large 
enough to be highly significant. A larger sample would, perhaps, 
give a result which would be highly significant. But from this 
small sample it may be concluded that the cabbage seeds which 
were treated were significantly better in germination than those 
which were not treated. 

It is possible to use the analysis of variance on a wide variety 
of sociological, economic, and business problems, such as popu- 
lation groups, advertising, current ratios, sales expenses, and class 
differences in education. In most cases the analysis of variance 
would not be sufficiently complete treatment for data in the social 
sciences, but it might be used as a preliminary method to discover 
whether the data would justify a more complete analysis. 

WORKSHEET NO. 102 


Analysis of Variance of Size of Families in Large Cities, 
Small Cities, and Rural Counties in the Corn Belt 


Rows 

Large Cities 

Small Cities 

Rural Counties 

Total 

1 

3.84 

3.60 

3.92 

11.36 

2 

3.68 

3.48 

4.15 

11.31 

3 

3.71 

3.51 

4.23 

11.45 

4 

3.74 

3.18 

4.52 

11.44 

5 

4.00 

3.35 

3.97 

11.32 

6 

3.96 

3.46 

3.87 

11.29 

7 

3.69 

3.25 

4.36 

11.30 

8 

3.82 

3.60 

3.88 

11.30 

Totals 

30,44 

27.43 

32.90 

90.77 
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1st Step. Computation of total sum of squares, 3 groups 
= 2X2 _ 

2X2 = (3 34)2 + (3 08)2 4. (. (3.88)2 = 345.8633 

2X = 90.77 

(2X)2 (90.77)2 8,239.1929 0.0 000-7 

__ = ___ = _ _ 343.2997 

2a;2 = 345.8633 - 343.2997 = 2.6636, total variance 


343.2997 


M Step 2 (Class Sums)^ — Correction __ 

(30.44)^ + (27.43)^ + (32.90)^ ^ 2761.4085 ^ 

8 8 

345.1761 — 343.2997 = 1.8764, between-groups variance 
Sd Step. Sum of squares between-rows 
Z!(Row Totals)^ - Correction _ 

/ 

(11.36)2 + (11.31)2 4- (11.45)2 + • • . + (11.30)2 = 1,029.9283 

I,02|9283 . 3^ 3^3 

343.30943 — 343.2997 = .0097, between-rows variance 


Computation of F , Test of Significance of Difference Between 
Groups, Between Bows, and Experimental Error 



Degrees of 
Freedom 

Sum of 
Squares 

Mean 

Squares 

Total 

23 

2.5636 


Between Groups 

2 

1.8764 

.9382 

Between Rows 

7 

.0097 

.0014 

Experimental Error 

14 

.6775 

.0484 


Groups F = = 19.4 Highly significant 


Rows F — — 2 . = 0.03 not significant 
.0484 / 
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EXPERIMENTAL ERROR 

An additional step is introduced in this problem in order to 
check the uniformity of the rows of data. If there were any real 
relationship between the items of data in a row, if the tie between 
the large city, small city, and rural county family size in each row 
were logically associated, it is possible for a large portion of the 
total variance to be associated with row size instead of group 
size. To measure how much of the total is associated with row 
relations, the third step is introduced into the problem. In this 
particular problem row relations have no logical and no actual 
significance as proved by the test. In many problems, however, 
in which the student will work, this cross-relation will be very 
important and should be measured and subtracted from the 
within variance which will hereafter be called Experimental Error. 
It is the total error caused by chance sampling and other errors 
all lumped together as a total unexplained residual. It is always 
the denominator of the E-ratio against which all computed vari- 
ances are measured. 

In setting up a problem for solution by the analysis of variance, 
the experimental design should eliminate as far as possible all 
extraneous factors. If one wished to compute the effects of 
various amounts of artificial light in hen houses on egg production 
in the winter season, hens should be chosen for the experiment 
which are as far as possible equal in age, weight, kind of feed, 
and previous treatment and condition. They should be of the 
same breed and otherwise identical. If one group is to be sub- 
jected to only eight hours of light while the other group is to be 
given sixteen or twenty hours of light per day, in order to test 
the effect of light on egg production, all other factors should be 
held constant. If all the other factors are not held constant, it 
may be that part of the difference in egg production is due to 
some of these other variables and is not due to difference in the 
number of light hours per day. If, for instance, one group of 
hens were one year old and the other group were three years old, 
the difference in egg production might be due primarily to age 
and not to light. If the feed of one group were superior to that 




WIDER POSSIBILITY OF STUDY 


517 


of the other group, the difference in egg production might be 
due to feed and not to light. It is evident that if the effect of 
longer days on egg production due to increased artificial light is 
to be measured accurately, age, feed, breed, and all other factors 
must be held constant. 

In spite of all effort to hold all extraneous factors constant, it 
is often impossible to do so. Fortunately, a statistical device is 
available whereby such uncontrolled differences may be com- 
puted out and the results held constant in spite of them. In 
Worksheet No. 100 on the heights of school children, no such 
correction or control was used. In Worksheet No. 102 on size of 
families, the variation between rows caused by any factors was 
computed and subtracted out of the total variance before the 
ratio between size of family variance and experimental error was 
computed. In this case, the variance among the rows was quite 
small, .0014, but in many problems it may be quite large. In any 
case, in which there is a logical basis for assuming a correlation 
between the several columns it is more accurate procedure to 
compute it and remove it from the within variance before F is 
computed for the columns. Worksheet No. 102 is, therefore, a 
better method than Worksheet No. 100. 

WIDER POSSIBILITY OF STUDY 

The method of computing out the influence of the extraneous 
and unequal factors enables one to use more data and to widen 
his research field and to work on problems which would other- 
wise be excluded. For instance, if it were impossible to obtain a 
sufficient number of hens of the same age or breed, one could 
use hens of various ages and breeds by computing out the average 
effect of the unequal factors, and thus arrive at a satisfactory re- 
sult. Often this is the only method on which one can proceed. 

After all these unequal and extraneous elements have been 
computed out of the variance that one can get at, all the rest 
of the variance is lumped together and called Experimental Error. 
Theoretically, experimental error should include only that error 
which is due to chance sampling, but actually in most problems 
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it also includes other elements of variance which the statistician 
is not able to segregate and remove. Experimental error is in 
any case the base or denominator of the F-ratio of significance. 

The principles involved in the egg production problem would, 
of course, apply equally well in all problems in the physical, bio- 
logical, or social sciences. The principles are universal. 

PRACTICAL RESEARCH PROBLEM SIGNIFICANCE 
OF TREATMENTS OF COTTON SEED 

In Worksheet No. 103 a much more complex and practical 
problem is introduced in which five separate elements of variance 
are analyzed; namely, series, varieties, chemicals, varieties and 
chemicals, and experimental error. This is an actual problem 
worked out by Dr. K. Starr Chester and Dr. W. W. Ray of the 
Botany Department of Oklahoma A. and M. College, to deter- 
mine the significance of treating cotton seed for disease resistance. 
Delinted-graded cotton seed and ordinary cotton seed with the 
lint left on it by the gin were both subjected to identical chemi- 
cal treatments to determine the effect on plants emerging and 
surviving. The chemical treatments proved to be highly signifi- 
cant in preventing disease, especially on the delinted cotton. 

The computations for Worksheet No. 103 are given in detail 
below as a guide to the student in organizing such a complex 
problem. There are five steps in the solution. 

1st Step. 

= (81)2 + (90)2 (30)2 + . . . (17)2 = i27_703 

(SZ)2 (2006)2 4,020,025 „„ „ 

nr = " 48 - = 

'Six^ = 127,703 — 83,751 = 43,952, Total sum of square 

2d Step. 

(482)2 + (466)2 + (496)2 + (561)2 . 

= 84,185, beries sum of squares 

84,185 - 83,751 = 434 
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WOEKSHEET NO. 103 

The Percentage of Germination of Fvzzy and Delinted-Graded 
Cotton Seed Treated with Various Chemicals and 
Planted in Soil Infested with Ehizoctonia 


Variety 

and 


Chemical 

1 

2 

3 

4 

Total 

Totals 

Bel-graded 






Ceresan 655** 

Ceresan 

81 

85 

83 

94 

343 

Spergon 661** 

Spergon 

90 

80 

91 

99 

360 

Spergonex 160 

Spergonex 

30 

18 

16 

29 

93 

Cyanamid 220 

Cyanamid 

15 

21 

11 

23 

70 

Sanoseed 162 

Sanoseed 

26 

25 

29 

28 

108 

Check 147 

Check 

15 

16 

24 

31 

86 


Variety Total 




1,060 


Fuzzy 







Ceresan 

79 

70 

84 

79 

312 


Spergon 

81 

70 

83 

67 

301 

(Series) ~ 84,185 

Spergonex 

15 

31 

8 

13 

67 

(Treatments) 84,026 

Cyanamid 

25 

32 

30 

63 

150 


Sanoseed 

12 

10 

14 

18 

54 

(Chem.) f = 123,475 

Check 

13 

8 

23 

17 

61 

(V & C) 1 = 125,357 

Series .Total 

482 

466 

496 561 



Variety Total 




945 


Grand Total 





2,005 





SX 

= 

2,005 





SX^ 

= ; 

127,703 





CF 


83,751 


Variate 

df 

SB^ 

Mean Square 

F-Value 

Total 

47 

43,952 



Calc, 5% 1% 

Series 

3 

434 


138 

2.50 2.89 4.44 

Varieties 

1 

275 


275 

4.74 4.14 7.47 

Chemicals 

5 

39,724 

7,945 

137.00** 2.50 3.64 

V & C Joint 

5 

1,607 


321 

5.54** 2.50 3.64 

Error 

33 

1,912 


58.0 



** These are the highly significant or important factors. 

Worksheet No. 103 illustrates how a complex combination of 
data can be explained by the analysis of variance. 
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Sd Step. 

(1060)^ + (945)2 2,016,625 

24 ^ ~W~ ^ 

84,026 — 33,751 = 275, Varieties sum of squares 

4th Step. 

Q87 790 

(655)2 4. (001)2 + (100)2 4_ (220)2 + (162)2 4. (147)2 = ^ = 123,475 

o 

123,475 — 83,751 = 39,724, Chemicals sum of squares 
Sth Step. 

Kn-j 40Q 

(343)2 + (360)2 + (93)2 + . . . 4- (01)2 == ^ = 125,357 

125,357 — 83,751 — 41,606, Total variance of 12 sub-classes 
41,606 ™ (39,724 -f .275) = 1,607 

f Total variance of 12 sub- 1 [ Chemicals plus ] f Joint varieties 1 

) classes of 2 varieties and ( — ] varieties sum of [ = ) and chemicals [ 
[ 6 chemicals J [ squares J [ sum of squares J 

TABLE 40 


Computation of F 


Variate 

df 

Squared 

Deviations 

Mean 

Squares 

F-Values 

Total 

47 

43,952 



Series 

3 

434 

145 

2.50 

Varieties 

1 

275 

275 

4,74 

Chemicals 

5 

39,724 

7,945 

137.00 

V & C Joint 

5 

1,607 

321 

5.54 

Error 

33 

1,912 

58.0 



The five problems explained in this chapter offer the student a 
sufficiently complete understanding of the analysis of variance to 
enable him to use this method effectively. There are, of course, 
still more complex variations of analysis possible, but the student 
who expects to specialize in research will take at least a second 
course in statistics. 
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DETAILED ANALYSIS OF WORKSHEET NO. 103 
Data 

The data in this problem are the percentages of germination of 
(1) fuzzy and (2) delinted cotton seed, separate plots of which 
are treated with five chemicals as follows: (1) Ceresan, (2) Sper- 
gon, (3) Spergonex, (4) Cyanamid, (5) Sanoseed, and (6) one 
Check plot not treated. Separate plots of the fuzzy and of the 
delinted each receive the same chemical treatment. 

Objectives of Analysis 

The several purposes in mind in making this analysis are to 
determine (1) whether the several chemical treatments produce 
significantly different results on the (a) fuzzy and on the (6) de- 
linted cotton; (2) whether there are any significant differences 
among the results produced by the five different chemicals; 
(3) whether there is any significant difference between the Check, 
or non-chemical, plot in each case and the plots on which chemi- 
cally treated seeds were planted; (4) whether there is any com- 
bined or joint association of varieties and chemicals that is 
significant. On this point it may be said that there is a possi- 
bility that the chemicals might have a more direct and powerful 
effect on the delinted seed; (5) four separate plots were used, as 
is indicated by the four columns of data, to reduce the error of 
chance sampling by providing a larger number of cases. It may 
be said that a still larger number of items or plots, as could have 
been provided in a Latin Square 6X6, for each variety (fuzzy and 
delinted) would have been still better; (6) to segregate and 
measure Experimental Error as the basis of the F-ratios of sig- 
nificance. 

Computations for Analysis 

For these six objectives, six steps of analysis are required as 
follows: 

First Step: Computation of Total Variance for all the data based 
on the null-hypothesis. Total Variance = 43,952. 
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Second Step: Computation of variance between the four columns 
of data, or four series. Series Variance == 434. This com- 
putation removes from the Experimental Error any element 
of variance due to variations in soil, rain, methods of culti- 
vation, etc., among the plots or series (columns of data in this 
case). Theoretically, if there were no variations in the per- 
centage of germinations due to plot, soil, rain, or other ex- 
traneous variations, the germination in the parallel plots should 
be the same. Since they are not identical, it must be assumed 
that they contain the effects of some extraneous factors which 
must be removed for an accurate measurement of the other de- 
sired variances. 

Third Step: Computation of variance between varieties (fuzzy and 
delinted in this case). Variety variance = 275. This result 
means that the chemicals were more effective on one of the 
varieties than on the other. 

Fourth Step: Computation of variation among the effects of the 
chemicals. Chemicals variance = 39,724, which is the larger 
portion of the total variance. It is quite evident that (1) the 
results on the chemically treated plots were quite superior to 
those of the Check plot on which no chemicals were used, and 
(2) that some of the chemicals were much more effective than 
others. The divergence among the effects of the chemicals is 
responsible for most of the total variance. This is the de- 
cisive factor in the problem. 

Fifth Step: Computation of joint effects of varieties and chemicals. 
Joint V and C Variance = 1,607. This step in the analysis is 
an important one. Sometimes the relationship between two, 
or more, factors is multiplicative instead of additive. In such 
a case when two are combined the result may be greater than 
their sum. In this case the fuzzy cotton might not yield as 
readily to the benefits of the chemicals as does the delinted 
cotton. The extent to which this relationship holds in this 
problem will be revealed by a computation of the variance of 
all twelve chemical treatments under the two varieties. In this 
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case, the result is not significant, but in some types of problems 
it will be, and the student should know how to compute the 
value. 

Sixth Step: By means of the five computations above the total 
variance and the separate portions of the total variance which 
are associated with or are the results of the several factors in- 
cluded in the problem are segregated and measured. The 
exact portion of the total variance related to each factor is now 
known. To complete the analysis it is necessary to reduce 
these separate portions of the total variance to ratios with a 
common base. Since the number of items or more exactly the 
degrees of freedom associated with the different factors in the 
problem are not equal, it is first necessary to reduce each sub- 
total of variance to mean variance so that all the separate por- 
tions of the variance will be on exactly the same basis and in 
identical units. The final step is to reduce all these separate 
mean variances to comparable ratios by dividing each one by 
the mean Experimental Error. The Experimental Error, as 
was stated above, is that portion of the total variance which is 
due to chance sampling and to all other unidentified variation. 

TABLE 41 

Significance of i^'-RATios 
In Fuzzy and Delinted Cotton Seed Problem 
F-Values 



Computed 

Required for 

5% 

1% 

Series 

2,50 

2.89 

4.44 

Varieties 

4.74 

4.14 

7.47 

Chemicals 

137.00 

2.50 

3.64 

V. and C. Joint 

5.54 

2.50 

3.64 


From the above table it is evident (1) that the variance of 
varieties is significant but not highly significant, (2) that chem- 
icals variance is most highly significant, (3) that V and C Joint 
variance is highly significant, and (4) that Series or plot vari- 
ance is not significant. 
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CONCLUSION 

If the mean variance for a factor is greater than that which 
could be reasonably expected to occur from chance sampling, the 
difference is considered significant. “Significant’' in its use in 
the statistical sense means a difference so large that it occurs by 
chance only one time out of 20, or only in 5% of the cases. “ Highly 
significant” is a statistical term used to indicate a difference so 
great that it could occur by chance only one time in 100, or only 
in 1% of the cases. Experimental Error is that variation which 
is the result of chance sampling. The ratio, therefore, between 
mean Experimental Error and any other given mean variance in- 
dicates the degree to which the variance of this given factor, 
could have occurred by chance. This is theF-ratio. 

R. A. Fisher and George W. Snedecor have worked out the 
mathematical limits of chance sampling occurrences for each com- 
bination of degrees of freedom. These ratios for both the 5% 
level of chance and the 1% level of chance are given in Table 38 — 
Values of F. By comparing the computed F-ratio for any prob- 
lem with the values in Table 38 for the same degrees of freedom, 
one can determine whether his computed F-ratio is sufficiently 
“highly significant.” The higher the computed ratio is above 
the required ratio, the more certain one is of the dependability 
of his results. 


SUMMARY 

1. Variance is the square of the standard deviation, or the mean of 
the summed squared deviations from the mean. 

2. The null-hypothesis is the assumption that there is no significant 
difference between two or more series of data. 

3. The analysis of variance is a method of reducing to a ratio (1) the 
portion of total variance due to variation between the means of the group 
of data and (2) the portion of the total variance which falls within the 
group of data, with this second portion as the base of the ratio. The 
variance is said to be significant when the obtained ratio is so large that 
it could be accounted for by chance sampling only one time in twenty. 

4. The method of analysis of variance makes it possible to make com- 
parisons among any number of groups of data at once. 
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5. The analysis of variance is especially useful in computing the degree 
of significance among the results obtained from planned experiments, 
especially in agriculture, based on the Latin Square or other types of 
replications. 

6. The analysis of variance does not give an exact measure of the per- 
centage of relationship between two variables as provided by the coefficient 
of determination, but only a ratio which by means of the F-table may 
be listed as (1) not significant, (2) significant, (3) highly significant, or 
(4) clearly larger than a ratio of high significance. 

7. The analysis of variance may be used in the fields of economics, 
sociology, education, psychology, political science or other fields of social 
science. It is a tool, however, which should be applied to carefully 
planned research projects providing exact data. 
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REVIEW QUESTIONS 

1. Define variance. How is it related to o-? What does it measure? 

2. Explain the principle underlying the null-hypothesis. Illustrate. 

3. What is meant by degrees of freedom In connection with 
what kinds of computations is it used? Why? 

4. What is ^‘within-group^^ variance? 

5. What is “between-groups’’ variance? 

6. What is the relationship of ^‘between-groups’’ variance to ^^within- 
group'^ variance? 

7. What is the meaning of the F-coefficient? How is it computed? 

8. What is ‘^Experimental Error ^^? How is it computed? 

9. For what reason is the variance computed for the “rows,^' or 
crosswise as well as for the classes or groups? 

10. What is the advantage of computing several cross or “inter’’ 
variance relations? 
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EXERCISES 


1. Analysis of Variance. 

Weight Gains of 30 Swine Fed 
Six Types of Rations 


1 

2 

3 

4 

5 

6 

35 

40 

39 

27 

25 

22 

19 

29 

27 

7 

31 

18 

41 

46 

20 

13 

46 

36 

15 

41 

29 

33 

49 

39 

30 

34 

45 

35 

24 

20 


2. Analysis of Variance. 

Average Daily Gains of 24 Swine Classified 
According to Ration and V^eight 
AND Beginning of Period 

Rations 


Weight 

1 

2 

3 

4 

1 

1.40 

1.31 

1.40 

1.96 

2 

1.79 

1.30 

1.47 

1.77 

3 

1.72 

1.21 

1.37 

1.62 

4 

1.47 

1.08 

1.15 

1.76 

5 

1.26 

1.35 

1.22 

1.88 

6 

1.28 

0.95 

1.48 

1.50 


3. Turnovers of inventory in Grocery Stores per year for 1 
2 city, individual owner; 3, chain; 4, super market. 


Type of Stores 


1 

2 

3 

4 

7 

12 

23 

31 

5 

14 

20 

33 

6 

11 

19 

30 

7 

14 

22 

30 

8 

13 

20 

29 


, rural; 
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4. The germination of two varieties of fuzzy cotton seed treated with 
various chemicals^ 

Variety and Replications 


Treatment 

1 

2 

3 

4 

Delinted-lla 

Check 

703 

724 

785 

695 

Ceresan 

647 

761 

736 

840 

152--6-B 

747 

746 

780 

728 

Spergon 

637 

767 

808 

591 

Spergonex 

794 

781 

839 

756 

Du Bay 1228R 

819 

839 

760 

760 

Rogers Acala 

Check 

719 

728 

600 

667 

Ceresan 

714 

754 

646 

691 

154-6-B 

749 

809 

736 

745 

Spergon 

699 

668 

754 

650 

Spergonex 

450 

347 

230 

365 

Du Bay 1228R 

784 

741 

756 

680 


5. The effect of depth of planting with Ceresan-treated fuzzy and 
graded acid-delinted cotton seed.^ 


Variety and 
Treatment 


Depth 


Replications 


Acala 8 

Inches 

1 

2 

3 

4 

Fuzzy 

1.00 

687 

802 

814 

840 

Graded-del. 

1.00 

808 

827 

830 

921 

Fuzzy 

1,75 

692 

799 

711 

737 

Graded~del. 

1.75 

852 

858 

872 

900 

Fuzzy 

2.50 

270 

435 

480 

295 

Graded-del. 

2.50 

492 

626 

578 

596 

Fuzzy 

3,25 

230 

209 

113 

138 

Graded-del. 

3.25 

292 

279 

291 

316 


1 Dr. K. Starr Chester and Dr. W. W. Ray, Oklahoma A. and M. College, 
Stillwater, Oklahoma. 



CHAPTER 22 

THE ANALYSIS OF COVARIANCE 


In Chapters 9 and 10, we dealt with only one variable at a 
time. We measured its mean, standard deviation, and variation, 
and that was all. In Chapter 11 regression was introduced as a 
methpd of measuring the relation between two or more variables 
at once. The analysis of variance is similar to the methods pre- 
sented in Chapters 9 and 10. It measures one variable at a 
time. Covariance is a combination of variance and regression 
whereby the variance of a variable may be measured after its re- 
lationship with a second variable has been removed by regression 
analysis. Covariance is, therefore, a much more complete and 
powerful method of statistical analysis than mere variance. It 
enables the statistician to bring to bear on small samples the 
combined analysis of regression, correlation, standard error of 
estimate, and variance. Variance analysis is based on the squared 
standard deviation. Covariance is based on the squared standard 
error of estimate. Variance is a one-dimensional measure. Co- 
variance is a two or more dimensional analysis. 

In the previous chapters the analysis of variance indicated 
clearly that the height of children in each of the six grades was a 
separate population. This is a somewhat surprising and dis- 
concerting discovery after we have computed so many measures 
on the basis of height for all six grades as one population. In 
seeking the reason for this variation in height between grades of 
school children, we discover that the common changing variable 
is age. The children in each grade are about a year older than 
those in the next lower grade. Perhaps increasing age with ac- 
companying growth is the disturbing factor. The analysis of 
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SF'' 16,627 18,828 21,361 22,821 25,451 28,100 

2XF 28,121 34,546 42,212 47,768 58,071 69,273 
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variance will not adequately solve this problem, but the analysis 
of covariance will. Covariance introduces the trend relationship 
with age which again unifies our sample of school children. 

In Worksheet No. 104 our solution is set up in the form of six 
separate but related regression problems. In Fig. 94, the data 
for each one of these six problems are plotted and the regression 
line for each grade as finally computed in Worksheet No. 105 is 
drawn in. Our ultimate comparison of these six grades is based 
on the size of the squared standard error of estimate for each 
separate grade. Grades 2, 4, and 6 have very small standard 
errors of estimate. Grades 1, 3, and 5 have larger deviations. 
On Fig. 95, the Total Regression Line and the six separate Grade 
Regression lines are shown. The sum of the six separate standard 
errors of estimate from the six separate regression lines is less than 
the total standard of estimate from the Total Regression Line. 
The reason for this discrepancy which is 200.336 — 170.704, or 
29.632, is that the sum of the standard errors from the six sep- 
arate regression lines is less than the standard error of all the 
items from the total regression line. The items of each class are 
closer to their own class line than to the total line.^ 

The X-values in Worksheet No. 104 are the ages of the children 
in months. The F-values are the associated heights of the same 48 
children. Since in the analysis of covariance it is essential to 
compute the regression and standard error of estimate for the 
paired values of each class, the (1) sums (2) sums of the X^s 
squared (3) SF^, sums of the F^s squared, and (4) SXF, sums of 
products of the paired X- and F-values are accumulated at the 
bottom of their respective class columns. 

The First Step is the accumulation of all the required totals 
from the class column sums at the bottom of Worksheet No. 104. 
The sum of the X^s (2X) for each class column is squared, (SX)^ 
The total is squared, as (618)2 = 381,924, (712)2 ^ 508,944, and 
so on for all 2X and all 2F. All of these totals are required in 
order to obtain the deviations from means. 

^ Adapted from Statistical Methods, Chapter 12, by Professor George 
W. Snedecor of Iowa State College, and are used with the kind permission of 
Professor Snedecor and the Collegiate Press, Inc., Ames, Iowa. 
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1st Step. 

Expekiment Totals 




( 27)2 

( SX )( S7 ) 



381,924 

131,769 

224,334 



506,944 

150,544 

276,256 



667,489 

170,569 

337,421 



799,236 

182,329 

381,738 



1 , 058,841 

203,401 

464,079 



1 , 364,224 

224,676 

553,632 " 



4 , 778,658 

1 , 063,288 

2 , 237,460 




2X = 5,238 





27 = 2,516 





2X2 = 598,574 





272 = 133,188 





2X7 = 279,991 



1 


2 

3 



72 

X2 72 

X2 

72 

6,241 

2,116 

7,396 2,209 

11,236 

2,601 

6,889 

2,304 

8,464 2,500 

10,201 

2,916 

6,400 

2,116 

8,100 2,209 

9,409 

2,601 

5,184 

1,849 

7,225 2,304 

9,216 

2,304 

5,625 

1,936 

8,464 2,500 

10,816 

2,401 

6,724 

2,500 

8,281 2,401 

11,449 

3,025 

5,625 

2,025 

7,921 2,304 

10,404 

2,704 

5,184 

1,681 

7,569 2,401 

10,816 

2,809 

47,872 

16,627 

63,420 18,828 

83,547 

21,361 

4 


5 

6 



72 

X2 72 

X2 

72 

11,664 

2,601 

16,384 3,136 

25,921 

3,600 

13,456 

3,025 

17,161 3,249 

20,736 

3,481 

12,100 

2,704 

17,161 3,481 

20,449 

3,600 

13,924 

3,249 

14,641 3,136 

22,801 

3,600 

12,789 

2,916 

14,400 2,916 

23,716 

3,721 

11,664 

2,601 

19,881 3,481 

19,600 

3,481 

12,100 

2,809 

16,129 3,136 

19,881 

3,481 

12,321 

2,916 

16,900 2,916 

17,956 

3,136 

100,018 

22,821 

132,657 25,451 

171,060 

28,100 
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XF 

XF 

3,634 

3,984 

4,042 

4,600 

3,680 

3,096 

4,230 

4,080 

3,300 

4,100 

4,600 

4,459 

3,375 

2,952 

4,272 

4,263 

28,121 

34,546 


XF 

XF 

5,406 

5,508 

5,454 

6,380 

4,947 

5,720 

4,608 

6,726 

5,096 

6,102 

5,885 

5,508 

5,304 

5,830 

5,512 

5,994 

42,212 

47,768 


XF 

XF 

7,168 

7,467 

9,660 

8,496 

7,729 

6,776 

8,580 

9,060 

6,480 

8,319 

9,394 

8,260 

7,112 

7,020 

8,319 

7,504 

58,071 

69,273 


This page of squares and products of X and Y includes the de- 
tailed computations required for Worksheet No. 104. They are 

(79)2 ^ 0^241, (83)2 0^839^ 

and so on through all the X- and F-values to the last F-value, 
(56)2 = 3,136. The first product of X and F is 79 X 46 = 3,634 
for the first X- and F-values in the first grade to the last X- and F- 
value of 134 X 56 = 7,504 in the last pair of items in the sixth 
grade. These sums of squares and products are all required for 
the computation of the regression lines and standard errors of 
estimate for the six separate classes, as developed in the second 
section of the analysis on page 539. 


M Step. Correction Terms 

(SX)2 (5,238)2 __ 27,436,644 

48 


N 

(EYY 

N 


^ - - 571,596.75 

48 ’ 

= = 131,880.333 


2Z-2F (5,238) • (2,516) 13,178,808 

— N 48 W~ = 274,558.5 


Sd Step. Sums of Squares 

Sx' = SX2 - = 598,574 - 571,596.75 = 26,977.25 

22/2 = 2F2 - = 133,188 - 131,880.333 = 1,307.667 

Xxy = SZF - = 279,99.1 - 274,558.5 = 5,432.5 




Plotted Data and Eegression Line for Each of the Six 
Grades Analyzed in Covariance in Worksheet No. 104 



Fig. 93. Class regression lines in the analysis of covariance. 
(From Worksheet No. 104) 
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Total Plotted Points op Data on 48 Stillwater School Children 
Together with Total Regression Line and Separate 
Regression Lines for Each Grade Sample 

Height 
in Inches 
66 

64 
62 
60 
58 
56 
54 
52 
50 
48 
46 
44 
42 
40 

60 70 80 90 100 110 120 130 140 150 160 

Age in months of 48 grade school children 

Pig. 94. Total regression line and separate class regression lines from 
the analysis of covariance. (From Worksheets No. 104 and No. 105) 


Regression Sums and Products 


597,332.25 597,332.25 - 571,596.75 - 25,735.5 


132,911 132,911 - 131,880.333 = 1,030.667 


279,682.5 


Age 

4,778,658 

8 

Height 

1,063,288 

8 

Products 

2,237,460 

8 



279,682.5 -* 274,558.5 = 5,124 
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4.ih Step. Squares of Errors of Estimate 
1=07 86? 

- l.JU/.oo/ 26,977.25 


Total 


= 1,307,667 - ^^26977^2^ ^ “ 1,093.961 

= 213.706 


Within classes 

Sy2 - 




277.0 - 


(308.5)2 

1,241.75 


= 277.0 - 


95,172.25 

1,241.75 


= 277.0 - 76.643 = 200.336 


The Second Step is the computation of the three correction 
values required to compute the deviations from the means for 
(1) SX2, which are (2) for 27^^ which are and, for 2XF,, 

which are '^ixy. This computation is the same as that used in 
computing the deviations from the mean in Chapters 11, 12, 
and 21. 

The Third Step is the actual computation of the deviations- 
from the means for 2a;2, S?/^, and Ihxy, In this case these devia- 
tions are = 26,977.25, ^y^ == 1,307,667 and 'Lxy = 5,432.5. 
These sums are the total variances for the entire group based on 
the assumption of the null-hypothesis of no differences between 
the classes. 


REGRESSION SUMS AND PRODUCTS BETWEEN CLASSES 

The second part of the Third Step is the computation of the 
deviations from the means for the between regression and stand- 
ard error of estimate. These values are, for X (Age) = 25,735.5, 
for Y (Height) = 1,030.667, and for XF (Products) = 5,124; and 
are the portion of the total variances which are associated with 
differences between the classes. 

The Fourth Step is the computation of the Squares of Errors of 
Estimate (1) for the total regression line and (2) for the within 
classes regression lines. The first figure is 213.706, which repre- 
sents the sum of squares of the standard error of estimate for the 
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total regression line of all six classes thrown together as one sam- 
ple. The second figure is 200.336 and represents the sum of all 
the squares of all six separate standard errors of estimate from 
the six separate class regression lines. 

The purpose of this part of the analysis of covariance is the 
same as that of its counterpart in the analysis of variance which 
is to separate the total variance into two portions one of which 
is associated with variation within classes and the other of which is 
associated with variation or differences between the classes. In 
the case of the analysis of covariance the sum of squares of the 
Standard Error of Estimate is divided into two portions, that 
which is due to variation within classes and that which is due to 
variation between classes. 


5th Step. 

COVAEIAKCE AND TeST OP SIGNIFICANCE OF ADJUSTED ClASS MeANS 


Source 

Degrees 

Sums of Squares and Products 

Errors of Estimate 

of 

of 




Sums of 

Degrees of 

Mean 

Vanation 

Freedom 


^xy 

S2/2 

Squares 

Freedom 

Squares 

Total 

JBetween 

47 

26,977.25 

5,432 5 

1,307 667 

213 706 

46 


-Classes 

6 

25,735.5 

5,124 0 

1,030.667 




Within 

Classes 

42 

1,241.75 

308.5 

277 0 

200 336 

41 

4 886 


13 470 5 2.694 


6th Step. 
Total 

Between 

Within 


Correlation 


f == 


5,432.5 


r = 


V26, 977.25 X 1,307.667 

5,124.0 

V25, 735.5 X 1,030.667 ' 
308.5 


= .9146 


.9945 


Vl,241.75X 277.0 


= .5865 


Ith, Step. Regression 


6 = 




308.5 

1,241.75 


= .2484 


X = 109.125 
Y = 52.417 
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a=Y-bX 

= 52.417 - (.2484 X 109.125) 

= 52.417 - 27.1066 
= 25.31 

Y = aAhX 

Y = 25.31 + .2484X Total Regression Line 

The Fifth Step completes the computation of the mean squares 
of the standard error of estimate for both the between portion and 
the within portion of the total covariance and also the F-ratio 
between them. This F- value, in contrast with the F-value of 
variance, shows the significance of the variation between the F- 
samples after the regression with X in each class has been removed. 
In this particular case F shows the significance of the variation 
in heights between classes after the effect of age on heights has been 
computed out or removed. 

The Sixth Step is the computation of total correlation and the 
total regression line. 

The conclusion of the analysis is that the highly significant 
differences among the six grades which were revealed by the 
analysis of variance are due almost entirely to differences in age. 
When the regression between age and height is computed and re- 
moved from the total variation the remaining variance between 
grades is quite insignificant. 

At this point (end of Worksheet No. 104) the covariance anal- 
ysis might be considered complete. F, r, byx, and Sy have all 
been computed. As far as the summary, or total, relations are 
concerned, it is complete. Additional information and confir- 
mation concerning the correctness of our analysis and additional 
insight into the inter-class relations may be obtained from the 
same data by the methods illustrated in Worksheet No. 105. 
The covariance, regression, correlation, and standard error of 
each class may be obtained. The fact that the sum of the 'Sx, 
Xxy, and accumulated from the individual classes equals the 
Within Classes totals of Worksheet No. 105 proves the theoretical 
correctness of Worksheet No. 104 as well as the accuracy of the 
computations. 
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WORKSHEET NO. 105 


1st Step. Separate Grade Regression Lines — 

No 1 

Zx^ = 47,872 - = 47,872 - — = 47,872 - 47,740 5 = 131.5 

No. 2 

Zx^ = 63,420 - = 63,420 - = 63,420 - 63,368 = 52 

o o 

No 3 

2^2 = 83,547 - = 83,547 - = 83,547 - 83,436 = 111 

O O 

No. 4 

Zx^ = 100.018 - = 100,018 - = 100,018 - 99,904.5 = 113 5 

No 5 

Zx^ = 132,657 = = 132,657 - ^ 132,657 - 132,355 125 = 301.875 

O O 

No 6 

Zx^ = 171,060 ~ = 171,060 - = 171,060 - 170,528 = 532 

M Step Sums of Grade 
No. 1 

Zy^ = 16,627 - = 16,627 - — = 16,627 ~ 16,471.125 = 155.875 

o o 

No. 2 

Zy^ = 18,828 ~ -2^ = 18,828 - = 18,828 - 18,818 = 10 

No. 3 

Zy^ = 21,361 - = 21,361 ~ — ^g -- = 21,361 - 21,321 125 = 39 875 

No. 4 

22/2 = 22,821 - = 22,821 ~ = 22,821 - 22,791.125 = 29 875 

No. 5 

2)2/2 = 25,451 - = 25,451 - = 25,451 - 25,425.125 = 25.875 

No. 6 

2)2/2 = 28,100 - = 28,100 - = 28,100 - 28,084 5 = 15.5 


Sd Step, Sums of Grades xy 
No 1 


Zxy 
No 2 
Zxy 
No 3 
Zxy 
No 4 
Zxy 
No 5 
Zxy 
No 6 
Zxy 


28,121 - 
34,546 - 
42,212 - 
47,768 - 
58,071 - 
69,273 - 


618 • 363 


8 


(712 

383) 

8 


(817 ’ 

413) 

8 


(894 

427) 

8 


(1029 

• 451) 

8 

(1168 

• 474) 


8 


994. 

28,121 - - = 28,121 - 28,041.75 = 79 25 

O 

= 34,546 - = 34,546 - 34,532 = 14 

007 AOi 

= 42,212 - = 42,212 - 42,177 625 = 34.375 

= 47,768 - = 47,768 ~ 47,717.25 = 50.75 

464. n7Q 

= 58,071 ^ — = 68,071 - 68,009 875 = 61 125 

= 69,273 - = 69,273 - 69,204 = 69 
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The four steps in Worksheet No. 105 are required to com- 
pute the regression coefficients, the coefficients of correlation and 
the standard errors of estimate for each of the six classes sepa- 
rately. The student will notice that the total value of Xxy^ 
and Xy^ at the bottom of the Fourth Step in Worksheet No. 105 
are identical with those of the Within Classes at the bottom of 
the Fifth Step in Worksheet No. 104. These values are: 

= 1,241.875, Xxy = 306.5, and Xy^ = 277.0. 

The sum of the parts equals the total. This result is to be expected 
and proves that the theory and methods of Worksheet No. 104 
are correct. 

The grade regression equations are required to plot the sepa- 
rate class regression fine in Figs. 94 and 95. 

4th Step. 

Eegression and Correlation Data for Each of Six Grades 


Grade 

Degrees 

of 

Freedom 

Sums of Squares of Products 
Xxy Xy^ 

Correla- 

tion 

Coeffi- 

cient 

Regres- 

sion 

Coeffi- 

cient 

Errors of 
Estimate 

Sum of 
Squares df 

1 

7 

131.5 

79.25 

155.875 

.5535 

.60266 

108.114 

6 

2 

7 

52.0 

14.0 

10.0 

.6140 

.26923 

6.230 

6 

3 

7 

111.0 

34.375 

39.875 

.5167 

.30968 

29.139 

6 

4 

7 

113.5 

50.75 

29.875 

.8715 

.44713 

7.181 

6 

5 

7 

301.875 

61.125 

25.875 

.6916 

.20248 

13.499 

6 

6 

7 

532.0 

69.0 

15.5 

.7599 

.12969 

6.551 

6 

Sums 

42 

1,241.875 

308.5 

277.0 

.5260 

.24841 

170.704 

36 


Grade Regression Equations 

1. F = - .18 + .60266X 4. F - 3.41 + .44713X 

2. F = 24.54 -f ,26923X 5. F = 30.33 -{- .20248X 

3. F = 20.00 + .30968X 6. F == 40.32 -f .12969X 

The net result of the work in Worksheets No. 104 and No. 105 

is that we have identified and measured the factor responsible for 
the great variation in school children's heights and have shown 
that when this factor of age-growth is eliminated there is no 
other significant difference in the variation of those heights. The 
analysis of variance gave a highly significant F of 31.24. In co- 
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variance analysis, F is reduced to the non-significant value of .65. 
Covariance is necessary in the solution of those problems in which 
the variable we desire to measure is influenced by a secondary 
variable of wide fluctuations. A common example of this condi- 
tion is the measurement of the gains of animals which are not of 
identical initial weight or age. A pig weighing 20 pounds will not 
ordinarily gain as much in a given time, week or month, as one 
weighing 40 or 60 pounds. To make accurate measurements of 
the significance of weight gains, the influence of the regression 
factor must be removed. Another example is the significance of 
gains in sales of various sizes. A firm with a capital of $10,000 
could not expect to show the same gain in sales achieved by a 
corporation of $100,000 or $1,000,000 capital. The regression 
between sales and capital must be removed before the true sig- 
nificance of sales increase can be measured. The same method 
must be used in comparing any of the many factors dependent on 
size of population between cities, counties, or states. 


Analysis of Eerors of Estimate from Average Eegression 
WITHIN Groups (Classes) 


Sources of Variation 

Degrees 

of 

Freedom 

Errors of Estimate 
Sum of Mean 

Squares Squares 

1. Deviations from average (error) 
regression within lots, Fifth 
Step, Worksheet No. 104 

41 

200.336 

4.886 

2. Deviations from individual lots 
(grade) regressions, Fourth Step 
Worksheet No. 105 

36 

170.704 

4.742 

3. Differences among lot (grade) 
regressions 

5 

29.632 

5.926 


ThisJ difference between the Error of Estimate of the original 
within variation of 200.336 with 41 df and the totaled deviations 
from the six separate classes of 170.704 with 36 df is due to vari- 
ance between the six regression lines and is due to Experimental 
.Error. In a planned experiment based on a Latin Square some, 
perhaps most of it, could be removed. 
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COVARIANCE IN SOCIAL SCIENCES 

The analysis of covariance is used primarily in biology, 
animal husbandry, agronomy, entomology, and related fields, but 
it may be used in economics, psychology, sociology, and education. 

To answer the question, ^^Are there any significant differences 
in the number of eating and drinking places in (1) small, (2) me- 
dium, (3) large cities in Pennsylvania if population size is re- 
moved by regression computations? The answer appears in 
Worksheet No. 106. 


WORKSHEET NO. 106 


Analysis of Covaeiance of Numbek of Eating and Dkinking 
Places (F) and Population (X) of Small, Medium, 

AND Laege Cities in Pennsylvania, 1930 


City 

Small 

2,500-5,000 

X Y 1 

Medium 

15,000-25,000 

X Y 

Large 

80,000 and up 

X F 

1 

2,8 

2 

21.4 

39 ! 

116.0 

220 

2 

3.6 

7 

23.6 

47 

143.4 

435 

3 

4.5 

12 

15.6 

38 

92.6 

207 

4 

3.5 

4 

16.5 

42 

86.6 

305 

5 

4.9 

9 

18.2 

55 

1,951.0 

3,923 

6 

2.9 

4 

16.0 

23 

80.3 

211 

7 

3.8 

7 

19.5 

46 

82.1 

127 

8 

4.8 

13 

21.4 

62 

669.8 

1,261 


Sums 30.8 

58 

152.2 352 

3,221.8 6,689 

123.20 

2; 

955.98 4, 

318,315.41 

SF2 

528 

16,452 

17,414,199 

SZF 243.5 

6,822.4 

8,659,241.0 

1st Step. Totals 



(SZ)2 

(SF)2 

(2X)(2F) 

SZ = 3,404.8 

948.64 

3,364 

1,786.4 

'Ey = 7,099.0 

23,164.84 

123,904 

53,574.4 

EX^ = 4,321,394.59 

10,379,995.24 

44,742,721 

21,550,620.2 

EY^ = 17,431,179 

10,404,108.72 

44,869,989 

21,605,981.0 

SZF = 8,666,306.9 
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M Step. Correction Terms 

(XXy (3,404.8)2 11,592,663.04 

N 24 24 

(27)2 (7,099)2 50,395,801 

N 24 24 


483,027.62 


= 2.099.825.04 


(3,404.8)(7,Q99) ^ 24,170,675.2 ^ 

N 24 24 

Sd Step. Sums of Squares 

Sa:2 = SZ2 - = 4,321,394.59 - 483,027.62 = 3,838,366.97 

22/2 = 2 72 _ (2^ ^ 17,431,179 - 2,099,825.04 = 15,341,353.96 

2x2/ = 2X7 - — — = 8,666,306.9 - 1,007,111.46 
= 7,659,195.44 

Regression Sums and Products 

Population — ^ — I = 1,300,513.59 

1,300,513.59 - 483,027.62 = 817,485.97 
Eating and Drinking Places — ^ = 5,608,748.62 


= 2,700,747.62 


5.608.748.62 - 2,099,825.04 = 3,508,923.58 

Products 21,605,981.0 ^ 2,700,747.62 

2.700.747.62 - 1,007,111.46 = 1,693,636.16 
Jfth Step. Square of Errors of Estimate 

Total 15,341,353,96- 


15,341,353.96 - 15,283,396 = 57,957.96 


Within Classes 


^ __ (2X7)2 (5,965,559)2 „ 

- 2x2 - 11.832,429 --^20,381 =^1,795.2 
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5th Step. 

Covariance and Test of Significance of Adjusted Class Means 


Source 

of 

Variance 

Degrees 

of 

Freedom 

Sums of Squares and Products 

SX2 SXF SF2 

Errors of Estimate 

Sums of 

Squares Freedom 

Total 

Between 

Cities 

23 

2 

3,838,366. 

' 817,485 

7,659,195 

1,693,636 

15,341,353 

3,508,924. 

f 57,957 96 

22 

Within 

Cities 

21 

3,020,881 

5,965,559 

11,832,429 

51,795 20 

20 2,589.8 


6,162 76 2 3,081.4 

^ == o = 1-19 — Not significant 

JjOoy.o 


Variance* Analysis of Number of Eating and Drinking Places 
IN Pennsylvania Cities, Small, Medium, and Large, 1930 


Source of 

Degrees of 

Sum of 

Mean 

Variation 

Freedom 

Squares 

Squares 

Total 

23 

15,341,353 


Between Cities 

2 

3,508,924 

1,754,462 

Within Cities 

21 

11,832,429 

563,425 


^ 1,754,462 . -z. X 

T — ' rno = 3.117 — Almost significant 
odo,42o 


This F-value of 3.117 for variance is within a few points of a 
significant difference between these cities, but the Covariance F- 
value of 1.19 is non-significant. This result shows that when 
we eliminate the difference in the number of eating and drinking 
places that is caused by population differences, Pennsylvania 
cities show no significant variation. In proportion to size, the 
small cities have about as many eating and drinking places as 
the larger cities. 

LATIN SQUARE 

As has already been indicated, the use of variance and covari- 
ance analysis is adapted primarily to planned experiments, es- 
pecially in agriculture, although these methods are not at all 
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limited to that field. Whatever the field may be in which they 
are used, the best results are obtained from planned experiments. 
It is not likely that data picked up at random or which were as- 
sembled for other purposes will give dependable results or con- 
tribute to scientific discovery just because they are run through 
the variance or covariance mill. More and more researchers are 
coming to understand that to obtain dependable results of scien- 
tific value the data must be 
obtained from carefully pre- 
pared experiments set up in 
conformity with the most scien- 
tific methods available. Scien- 
tific knowledge doesn^t ^^jes’ 
grow”; it is wrought out by 
great labor. 

As was indicated in Work- 
sheet No. 102 the data values 
in the rows as well as in the 
columns should be as free from 
outside and biased variation as 
possible. It is possible to elimi- 
nate aU or most of this bias by computing the variance associated 
with the rows as was done in this case. 

If, however, such bias can be prevented from entering the data 
at aU, it is so much the better. Sources of such bias in field ex- 
periments in agriculture are unequal fertility of soil among the 
various plots, unequal drainage, more severe wind pressure on 
one side of the plants or plots, and other such inherent physical 
characteristics of the field, many of which cannot be wholly 
eliminated. 

The Latin Square is a research device perfected by Fisher for 
counteracting or canceling out these diversities and variations. 
It is designed to have as many rows as columns, so that each 
treatment appears in various portions of the plot. If there are 
n rows, there are n columns. The number of separate treatments 
is, therefore, arranged so that each treatment appears once but 
only once in each column and in each row. 
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Fig. 95. Idealized Latin Square 









MEANING AND USE OF COVARIANCE 


545 


In this square there are eight treatments (fertilizers or time of 
planting or depth of planting or kind of seed or condition of seed 
or any other of eight different conditions) each of which appears 
once and once only in each row and in each column. The eight 
are so evenly distributed over the field that any difference in soil, 
drainage, wind, or other physical condition inherent in the field 
would affect all treatments in the same way. This arrangement 
makes the experimental base uniform. If one side of the plot is 
poorly drained, this bad condition is equally distributed among 
all varieties because some of all varieties are in the poorly drained 
land. If one side of the field is especially fertile, this beneficial 
condition is evenly distributed among all varieties, because some 
of all varieties are planted on the fertile side of the field. The 
good and the bad conditions of the physical environment are 
evenly distributed among all varieties because all varieties are 
scattered all over the plot or field. In both statistical theory and 
method this is an ideal procedure. All planned experiments as 
far as possible should follow this principle. 

If it is impossible or inconvenient to set up a true Latin Square, 
similar results may be obtained from a Randomized Block System 
shown in Fig. 90. 

Of course, a Latin Square may be of any size, but those from 
3^ to 6^ are most frequently used. 

MEANING AND USE OF COVARIANCE 

1. Covariance analysis combines the methods of the analysis of 
variance with correlation and regression analysis. Variance anal- 
ysis measures the degree of significance of the variation among two 
or more classes as to a single variable such as (1) gain in weight, 
or (2) growth height, (3) or incomes, (4) or sales, (5) or tensile 
strength, or any other variable which may change from class to 
class. In each, comparison is made between the several classes 
or groups for only one variable at a time. 

Covariance analysis measures the degree of significance of the 
variation among two or more classes as to a single variable after 
its relationship with a second variable has been computed by re-- 
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gression methods and removed from the deviations or variation of 
the first variable. In testing the effects which a given ration pro- 
duces in the gain in weight of pigs, it is desirable to have all the 
pigs in the several pens of the same age, weight, breed, etc. Often 
this is impossible because of the lack of a sufficient number of iden- 
tical pigs. This lack of identical pigs would be a serious handicap 
in the use of the analysis of variance but it need not at all hinder 
one in reaching dependable results with the analysis of covari- 
ance. All that is required in the covariance analysis is to com- 
pute the regression between original weight, (X 2 ), and gains, (Xi) 
or (F), and remove this amount of variation from the total 
variation for each pen or group. With the effect of this outside 
variable removed a dependable comparison can be made between 
the remaining sums of variation among the several groups of pigs. 
The analysis of covariance enables one to use with dependable re- 
sults much data which otherwise must be discarded, or if used, 
employed with a large degree of error. 

2. Covariance analysis as here developed assumes straight line 
regression, which will not apply accurately in many cases. It is, 
however, possible to use curvilinear regression in the analysis of 
covariance just as it is in correlation analysis. This work is some- 
what beyond the level of an elementary course but is available 
in more advanced texts. ^ 

3. Covariance analysis would be useful in measuring (1) the 
degree of significance of variation in profits among grocery stores 
of varying sizes in different cities, (2) the incomes of machine 
shops of varying sizes in different communities, (3) the production 
of eggs by hens of varying ages for different breeds, (4) the resist- 
ances of materials of varying thicknesses in different uses, (5) the 
yields of wheat or other crop with varying degrees of rainfall on 
different soils. 

These examples will indicate a few of the many fields in which 
this complex but powerful research device may be used profitably. 
It is so important in the analysis of small samples obtained from 

^ Fisher, R. A., Statistical Methods for Research Workers, Oliver & Boyd, 
Ltd., Edinburgh, 1936. Snedecor, George W., Statistical Methods, Chapter 
13, Collegiate Press, Ames, Iowa, 1938. 
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planned experiments in agriculture and in the biological sciences 
that a brief presentation of its theory and methods is required in 
an elementary text. 


SUMMARY 

1. Covariance is a method of analysis which combines the methods of 
the analysis of variance with the methods of regression analysis and corre- 
lation analysis. It is one of the most powerful and exact of all the methods 
of statistical analysis and gives dependable results only when applied to 
data obtained from carefully planned experiments. 

2. The analysis of variance is based on the squared standard devia- 
tion, but the analysis of covariance is based on the squared standard 
errors of estimate. 

3. The analysis of covariance removes from the total variation of the 
factors to be compared that portion of the variation of each factor which 
is due to the influence or association of a second or outside factor. 

4. The relationship between the two variables may be either straight 
line or curvilinear. The examples presented in this text are only straight 
line and are, therefore, not applicable to clearly curvilinear relationships. 

5. Regression coefficients and coefficients of correlation may be com- 
puted for each class or section of paired variables as well as for the total 
group of data. 

6. The Latin Square is a device used in planned experiments which 
permits each variable to be subjected to all positions and influences of 
random variations in order to nullify or eliminate such extraneous factors 
from the results. It is especially useful in agricultural experiments. 

7. The analysis of covariance provides four results as follows: 

a. The coefficients of correlation between the two variables in each 
block and the total of all the blocks. 

b. The regression lines for each block and for the total data of all 
blocks combined. 

c. The standard error of estimate for each block and for the total 
data of all blocks, and 

d. The significance of the residual variations between all the blocks 
of data after the influence of the independent variable has been removed. 
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REVIEW QUESTIONS 

1. Define Covariance. 

2. What is the difference between variance and covariance? 

3. What is the function of regression in covariance? 

4. What is the use of the standard error of estimate in covariance? 

5. What is the relation of the regression lines of the separate groups 
to the total regression line? 

6. Why is the sum standard error of estimates of the several groups 
smaller than the standard error of estimate for the total regression line? 

7. What is a Latin square and what is its value? How is one con- 
structed? 

8. What are Randomized Blocks? What is their use? How are they 
set up? 

9. Why may F for covariance be smaller than F for variance? 

EXERCISES 

1. Original weight (X) and Gain (F) During Feeding Experiment of 
21 Hogs on 3 Rations. 


Ration 1 Ration 2 Ration 3 


Hog 

Weight 

X 

Gain 

F 

Weight 

X 

Gain 

F 

Weight 

X 

Gain 

F 

1 

51 

1.1 

55 

1.7 

48 

1.3 , 

2 

62 

1,3 

69 

1.9 

58 

1.4 

3 

48 

1.2 

47 

1.5 

39 

1.2 

4 

40 

1.1 

52 

1.6 

55 

1.3 

5 

39 

.9 

59 

1.6 

61 

1.4 

6 

50 

1.2 

70 

1.8 

45 

1.2 

7 

58 

1.3 

63 

1.7 

53 

1.3 
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2. Capital in millions (X) and Income in millions (F) of (1) Railways, 
(2) Public Utilities, and (3) Industrial Corporations. Moody^s Railways, 
1941. 


Company 

Railways 

Public Utilities 

Industrial 

Corporations 

Capital 

Income 

Capital 

Income 

Capital 

Income 


X 

F 

X 

Y 

X 

Y 

1 

91 

6.7 

31 

1.1 

154 

10.3 

2 

44 

4.8 

67 

2.1 

24 

2.0 

3 

10 

.6 

136 

3.5 

38 

4.8 

4 

130 

5.4 

132 

3.2 

73 

3.1 

5 

60 

1.9 

192 

4.8 

143 

15.0 

6 

510 

16.8 

70 

2.4 

18 

2.5 

7 

710 

44.1 

33 

.9 

11 

2.3 

8 

41 

1.3 

! 80 

2.5 

256 

21.8 


3. Advertising Expense (X) and Sales (F) in 5 Sales Territories, 1936- 
1941, 



1 1 

, 2 

3 

4 

5 


Adver- 

Sales 

Adver- 

Sales 

Adver- 

Sales 

Adver- 

Sales 

Adver- 

Sales 

Year 

tismg 

% 

tising 

% 

tismg 

% 

tismg 

% 

tising 

% 


Ex- 

Expan- 

Ex- 

Expan- 

Ex- 

Expan- 

Ex- 

Expan- 

Ex- 

Expan- 


pense 

sion 

pense 

Sion 

pense 

Sion 

pense 

si on 

pense 

Sion 


X 

Y 

X 

Y 

Y 

Y 

Y 

Y 

X 

F 

1936 

210 

42 

560 

10 3 

1,000 

21 4 ' 

870 

19.3 

430 

87 

1937 

325 

6 1 

480 

9.2 

850 

18 7 

680 

14 7 

455 

9 2 

1938 

280 

54 

610 

12 4 

1,260 

24 8 

750 

17 7 

640 

12 7 

1939 

362 

7 9 

655 

14 1 

1,420 

27 1 

880 

21 1 

580 

11 9 

1940 

480 

9 1 

790 

18 7 

1,664 

32 8 

1,070 

23 1 

695 

15 2 

1941 

375 

74 

595 

11 6 

975 

19 9 

820 

18 8 

475 

10 4 





Part Five 

Curvilinear and Multiple 
Methods of Analysis 

CHAPTER 23 

CURVILINEAR REGRESSION 


In Chapter 11 the free-hand and mathematical techniques of 
linear regression were presented. Such methods are adequate for 
only straight line relationships, which are relatively rare in the 
economic and social world as well as in the biological and physical 
universe. Most relationships among variables are not straight 
and can be described and measured accurately only by some 
curvilinear function. 

Supply and demand schedules usually run in curved lines. 
Utility does not increase or diminish in successive uniform ab- 
solute amounts. It declines more rapidly at first and more slowly 
later. Overhead or fixed costs and even direct and selling costs 
vary according to curvilinear functions. To double output will 
reduce fixed costs per unit about 50%, but to treble production 
will cut overhead costs per unit only 16.7% more. Interest rates, 
wage scales, raw materials costs and most other business factors 
vary in curvilinear relations. Because trees and animals are three 
dimensional they tend to increase in volume and weight at a faster 
rate than their increase in height. Because of the burden of in- 
creased weight due to gravitation, which increases in animals faster 
than muscular strength, small animals are relatively stronger than 
large ones. A squirrel in proportion to its weight is stronger than 
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a horse. In proportion to its volume an ant can do more work 
than an elephant. The relationships among strength, weight, and 
volume are not straight line. The graphs and formulas with 
which every textbook on astronomy, chemistry, physics, and 
geology is filled indicate that the relationships in the physical 
sciences are curvilinear. It seems necessary, therefore, to in- 
clude even in an elementary text in statistics a brief analysis of 
simple curvilinear regression. The methods presented here are 
simple and do not require a knowledge of mathematics beyond 
first semester algebra. 

CLASS AVERAGES METHOD 

The student is already familiar with the method of fitting a 
straight line trend through two sets of class averages. The only 
modification of this method necessary to fit a curved line is to 
set the data up in more than two class intervals, preferably five or 
more. The essential steps in the technique are as follows: 

1. Plot the data on coordinate graph paper. 

2. Arrange the data in an array for values of X, ranging from 
the smallest to the largest items, keeping each F-value associated 
with its own X-value in a companion column regardless of the 
order or size of the F-values, as indicated in Worksheet No. 107. 

3. Obtain the sums of each column. 

4. Compute the means for each column. 

5. Plot these paired averages on the coordinate graph paper on 
which the original data were plotted. 

6. Draw as smooth a curve as possible through or among these 
plotted averages. The reason for not always drawing the curve 
so that it passes exactly through the averages is that the averages 
may vary erratically because of the smallness of the sample. A 
larger sample would give more dependable averages. By drawing 
the curved line between or among the averages in the form of a 
smooth curve one avoids the influences of chance variations in a 
small sample. 

Two class intervals limit one to a straight-line trend. In many 
cases such as the one above, the tme regression is curvilinear. 
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WORKSHEET NO. 107 


Method of Determining Curvilinear Regression Through Class 
Averages for Yield per Acre and Cost per Bushel 
FOR Wheat on 25 North Dakota Farms 


X S-9 

X 10-12 

X 13-16 

X 17-21 1 

X 22-28 

1 

X 29-30 

X 

Y 

X 

Y 

X 

X 

X 

y 

X 

Y 

X 

Y 

Yield 

Cost 

Yield 

Cost 

Yield 

Cost 

Yield 

Cost 

Yield 

Cost 

Yield ' 

Cost 

8 

$2.00 

10 

$1 56 

14 

$.93 

17 

$86 

22 

$.54 

29 

$60 

9 

184 

11 ; 

1.29 

14 

89 

18 

.75 

24 

50 

30 

60 



12 

1.08 

15 

98 

19 

.67 

26 

.48 

31 

.67 





16 

86 

20 

67 

27 

.58 

32 

62 







21 

.73 

28 

.49 

33 

.70 











34 

64 

Sum 17 

$3 84 

33 

$3 93 

59 

$3.66 

95 

$3 68 

127 

$2 59 

189 

$3.83 

No. 2 

2 

3 

3 

4 

4 

5 

5 

5 

5 

6 

6 

Average 8.5 

1.92 

11.0 

131 

14.7 

91 

19 00 

.733 

25.4 

.62 

31.5 

.638 


To fit such a curve, more than two class intervals are necessary. 
The statistician must be guided by his best judgment in such 
cases, but may make as many classes as he thinks the form of the 
data requires. The class intervals need not be of exactly equal 
width. On the graph it will appear often that there are open 
places or definite breaks in the data. It is better to place the 
ends of the class intervals at these open spots, or points of definite 
break in the data. If such breaks do not occur, the class intervals 
may be made equal or unequal arbitrarily as seems most likely to 
result in the best regression line. The best line is the one that 
splits the plotted points approximately equally from one side of 
the graph to the other. Where the trend of the data changes di- 
rection rapidly, it is better to have more and narrower class in- 
tervals in order to indicate more definitely the correct variation 
in the line. 

The six pairs of class averages from Worksheet No. 107 are 
located on Fig. 97 at the points of the small hollow circles. A 
smooth curve is then dra^vn through or among these averages. 
The curve is in the general form of a simple parabola. 

It will be found that it is very easy and economical to fit cur- 
vilinear regression lines by this method. It not only requires very 
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In cents 
per bu. 



Fig. 96. Data and class averages for wheat yields, plotted 
from Worksheet No. 108 showing free-hand curve fitted to 
class averages 


little time and very few computations, but if the sample is of fair 
size, the line will fit remarkably well. It cannot be expressed in a 
formula and manipulated algebraically. With these limitations, 
it is still a very useful device. Such a trend line will usually give 
the statistician a much clearer idea of the type of mathematical 
line he should select for the data than he would have if he did not 
fit the trend line through class averages. 
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STANDARD ERROR OF ESTIMATE AND CORRELATION 
COEFFICIENT FOR FREE-HAND REGRESSION LINES 

Since the free-hand method has no regression equation, the usual 
mathematical computation of a standard error of estimate and co- 
efficient of correlation is not possible. An approximation of these 
computations can be obtained by a graphic method of reading 
off the deviations from the plotted points, which represent the data, 
to the free-hand regression line. The read-off deviations are com- 
parable in meaning to the mathematically computed deviations 
from a mathematically located regression line. The read-off 

WORKSHEET NO. 108 

Computation op Feee-Hand Cueve Standaed Eeeoe of Estimate 
AND Coefficient op Coeeelation foe 25 Noeth Dakota Faems 


Independent 

Variable 

X 

Read off 
Deviations 
Z 


Independent 

Variable 

X 

Read off 
Deviations 
Z 

Z^ 

8 

- 5 

25 

21 

+ 10 

100 

9 

+ 5 

25 

22 

- 6 

36 

10 

+ 3 

9 

24 

- 5 

25 

11 

- 3 

9 

26 

- 5 

1 25 

12 

- 9 

81 

27 

+ 5 

25 

14 

- 2 

4 

28 

- 6 

1 36 

14 

- 6 

36 

29 

+ 2 

1 4 

15 

+ 8 

64 

30 

0 

0 

16 

+ 2 

4 

31 

! +5 

25 

17 

+ 7 

49 

32 

- 3 

9 

18 

0 

0 

33 

+ 2 

4 

19 

- 5 

25 

34 

- 7 

49 

20 

- 1 

1 









670 



670 

25 


= 26.4 


y = V26.4 = 5.14 


p2= 1 


= 1 


_ ^ 26.4 

o-y^ 1,599.69 
- .0165 - .9835 


, 39,992.36 

Q- ^ = — 

25 

= 1,599.69 
ay = 40.0 
p = .99 
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deviations on the graph bear the same relationships to the free- 
hand line that the computed deviations bear to the computed 
line. They measure in both cases the scatter of the data around 
the regression line. They both measure the extent to which the 
regression line fails to account fully for the exact size and loca- 
tion of the given data. They indicate the extent to which factors 
other than the independent, or X-variable are influencing the 
measurements of data for the dependent variable or on the F-axis.^ 

One measure of Sy computed above when measured plus and 
minus from the free-hand regression line includes 17 of the 25 
items in the sample, or 64 percent of them. In this it also con- 
forms closely to the meaning of a mathematical standard error of 
estimate measured from a mathematical regression line. If the 
sample were larger the chances would be greater than one Sy plus 
and minus would include 68.26 percent of the sample items. 

The free-hand method is especially well suited to the analysis 
of large samples which are long and tedious for complex mathe- 
matical analysis. The free-hand method enables the statistician 
to experiment quickly and cheaply with many large samples from 
a population as a means of deciding on a permanent mathematical 
function. It is an excellent descriptive device. 


Curvilinear Correlation 


R-ho, p, is the symbol used to indicate the measure of the 
curvilinear coefficient of correlation, r, which was employed in 
earlier chapters, is used to indicate only straight line correlation. 
To distinguish between the curvilinear and the linear relation- 
ship p is employed. It is usually computed by subtracting the 


fraction. 



from 1, as is indicated in 
Formula No. 88 


p2 



Curvilinear coefficient of determination 


^ The deviation can be measured by the device of a thin piece of card- 
board or graph paper marked with 0 at a midpoint and the scale of the F-axis 
of the graph marked on the cardboard from zero (0) up and down. If this 
measuring device is held vertically, parallel to the F-axis, with 0 on the 
free-hand curve, the deviation of the points from the curve can be quickly 
read and recorded. 
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and 


Formula No. 89 



Curvilinear coefficient of correlation 


/S 

Since the fraction ( — ) is the measure of the amount of varia- 

WJ 

tion in Y which is not associated with variation in X and since 
unity, (1), is the measure of perfect or complete correlation, the 
subtraction of the amount of variation in Y which is not associated 
with variation in X gives the measure of the amount of varia- 
tion in Y that is associated with X. These formulas may be used 
with either free-hand curves or mathematical regression lines. 


Relation Between Regression and Coefficient 
of Correlation 

The concepts of correlation and regression are closely related. 
The size of the coefficient of correlation for any given set of data 
depends to a considerable extent on the degree to which the under- 
lying regression equation and line measure the true relation 
between the variables. If the relation between the variables is 
truly a straight line, a larger coefficient of correlation will be 
obtained if a straight regression line is used. If, however, the 
relation between the variables is actually curvilinear, a larger co- 
efficient of correlation will be obtained if the correct curvilinear 
regression equation is chosen. It is impossible to get a coefficient 
of correlation which will measure the full amount of relationship 
in the data unless the regression equation which best measures 
that relationship is chosen. 


Least Squares Method of Com'puting 
Simple Parabola 

The equation for a simple parabola is Y = a + bX + cX^. In 
the actual results obtained for curvilinear data the algebraic 
signs of b and of c are opposites. If the line rises to a maximum 
and then falls, there will be a + & and a — c. If on the other hand, 
the line falls to a minimum and then rises, there will be a - 6 
and a + c. The a measures the F-intercept. The b measures the 
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tendency for the line to turn in one direction while the c measures 
the tendency for the line to turn in the opposite direction. Since 
b and c have opposite signs and since b is weighted by X while 
c is weighted by X^, as the values of X increase the product of 
cX^ tends to exceed the product bX, and gradually to turn the 
line in the opposite direction. 

The necessity of computing three unknowns, a, b, and c, re- 
quires three normal equations which are obtained by multiplying 
successively the original equation F = a + bX + cX^ through for 
each pair of data by the coefficients of a and b and c. 


Formula No. 90 
The Normal Equation becomes 

( 1 ) XY=^Na+bXX + c2X^ 

(2) 2ZF = a2X-f 62X2-1- cSX3 

(3) XXW = a2X2 -f bXX^ + cSX^ 

To compute a parabola from these equations the sums or totals 
of the data must be used. This has the disadvantage that for a 
large sample the numbers become long and difficult to get on the 
calculating machine. A great saving of time can be made by 
coding X2 as U, or as a third variable, and reducing the equa- 
tions to deviations from the means of the variables. With this 
transformation a is computed separately by the equation 

a = F ~ 6X — cU — Formula No, 91 


and b and c are computed from the equations 


( 1 ) (2x‘^)b + (^xu)c == 'Lxy 

(2) {^xu)b -b = ^uy 


Formula No. 92 


in which y, x, and u replace the F, X, and U of the original 
normal equation. Each equation is shortened by one term and 
the number reduced from three equations to two. This is an 
economy well worth while. The use of the corrections and the 
deviations from the means computed at the bottom of Worksheet 
No. 109 further reduces the work required to fit a parabola. The 
short solution illustrated on page 560 minimizes the task. 
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WORKSHEET NO. 109 

COMPOTATION OF SIMPLE PaEABOLA FOR CoST PER BeSHEL AND 

Yield per Acre of Wheat for 25 North Dakota Farms 


Yield of 

Cost per 






Wheat in 

Bushel 

U 

XU 

7/2 


UY 

Bushels 

in Cents 






X 

F 

X2 

X3 


XY 

X2F 

9 

184 

81 

729 

6,561 

1,656 

14,904 

8 

200 

64 

512 

4,096 

1,600 

12,800 

10 

156 

100 

1,000 

10,000 

1,560 

15,600 

11 

129 

121 

1,331 

14,641 

1,419 

15,609 

33 

70 

1,089 

35,937 

1,185,921 

2,310 

76,230 

20 

67 

400 

8,000 

160,000 

1,340 

26,800 

34 

64 

1,156 

39,304 

1,336,336 

2,176 

73,984 

31 

67 

961 

29,791 

923,521 

2,077 

64,387 

14 

93 

196 

2,744 

38,416 

1,302 

18,228 

18 

75 

324 

5,832 

104,976 

1,350 

24,300 

15 

98 

225 

3,375 

50,625 

1,470 

22,050 

21 

73 

441 

9,261 

194,481 

1,533 

32,193 

16 

86 

256 

4,096 

65,536 

1,376 

22,016 

12 

108 

144 

1,728 

20,736 

1,296 

15,552 

14 

89 

196 

2,744 

38,416 

1,246 

17,444 

19 

67 

361 

6,859 

130,321 

1,273 

24,187 

27 

58 

729 

19,683 

: 531,441 

1,566 

42,282 

29 

60 

841 

24,389 

707,281 

1,740 

50,460 

32 

62 

1,024 

32,768 

1,048,576 

1,984 

63,488 

17 

86 

289 

4,913 

83,571 

1,462 

24,854 

24 

50 

576 

13,824 

331,776 

1,200 

28,800 

22 

54 

484 

10,648 

234,256 

1,188 

26,136 

26 

48 

676 

17,576 

456,976 

1,248 

32,448 

28 

49 

784 

21,952 

614,656 

1,372 

38,416 

30 

60 

900 

27,000 

810,000 

1,800 

54,000 

Sums 520 

2,153 

12,418 

325,996 

9,103,116 

38,544 

837,168 

Means 20.8 

86.12 

496.72 





Corrections 


10,816 

258,294 

6,168,269 

44,782 

1,069,438 

Deviations 


1,602 

67,702 

2,934,845 

-6,238 

-232,270 


Detailed Com-pvtation of Corrections 
2a:2 = SZ^ - NX^ 

= 12,418 - 25(20.8)2 
= 12,418 - 10,816 
= 1,602 
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Sxw = T.XU - NXU 

= 325,996 - 25(20.8) (496.72) 

= 325,996 - 258,294 
= 67,702 
= 2C72 - iVC72 
= 9,103,116 - 25(496.72)2 
= 9,103,116 - 6,168,269 
= 2,934,845 
I,xy = SXF - NXY 

= 38,544 - 25(20.8) (86. 12) 

= 38,544 - 44,782 
= - 6,238 

l^uy = ZUY - NUY 

= 837,168 - 25(496.72) (86. 12) 

= 837,168 - 1,069,438 
= - 232,270 

The student should check these detailed computations back 
against the Sums, Means, Corrections, and Deviations at the 
bottom of Worksheet No. 109. 

Short Method of Commuting Deviations 

2a;2 = 2X2 - (2X)(X) = 12,418 - (520) (20.8) 

= 12,418 - 10,816 = 1,602 

One multiplication and one subtraction are all the compu- 
tations required by the short method as compared with two 
multiplications and a subtraction in the detailed method. The 
detailed method makes every step complete and clear. The short 
method saves time and should always be used after the student 
understands the relationsjnvolved. JThe short method is based 
on th^fact that 2X = NX, 2F = NY , TiU — NU and, therefore 
(2X)X = XX2, (2X)F, or (2F)X = XXF, {XU)X = NUX, 
and (SU)U = NU\ 

Zxu = 2Xt7 - (2X)(C/) = 325,996 - (520) (496.72) 

= 325,996 - 258,294 = 67,702 
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= 2172 - (SC/)(C/) = 9,103,116 - (12,418) (496.72) 

= 9,103,116 - 6,168,269 = 2,934,845 
'Exy = SZF - (2Z)(F) = 38,544 - (520) (86.12) 

= 38,544 - 44,782 = - 6,238 
Xuy = 2i7F - (2i7)(F) = 837,168 - (12,418) (86.12) 

= 837,168 - 1,069,438 = - 232,270 

Five multiplications and five subtractions are sufficient to 
compute the deviations for the equations of a parabola after the 
sums of the original data are obtained and the three means com- 
puted. On the machines the student can soon perform the oper- 
ations quickly, which takes the drudgery out of computing 
parabolas. 


SOLUTION OF PARABOLA 

Equations in terms of deviations from the means 

(Sx^)b + (^xu)c = hxy 
{'2xu)b -b (2w^)c = 2 m2/ 

Substitutions from Worksheet No. 109 

(1) 1,602 b + 67,702 c = - 6,238 

(2) 67,702 b + 2,934,845 c = - 232,270 

Divide each equation through by the coefficient of 6 in that 
equation 

(1) b + 42.2609c = - 3.89388 

(2) b + 43.3494c = - 3.43077 
0 - 1.0885c = - .46311 

c = .425457 

Subtract (2) from (1) 

Substitute arithmetic value for c in equation (1) 

b + 42.2609 (.425457) = - 3.89388 
b + 17.980196 = - 3.89388 
6 = (- 17.980196) + (- 3.89388) 
b = - 21.874076 
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Compute a - Y — bX — cU 

a = 87.12 - (- 21.87408) (20.8) - (.425457) (496.72) 
a = 87.12 + 454.98 - 211.33 
a = 330.87 

Regression Line for Simple Parabola 
F = a + 6Z + cZ2 

F = 330.87 - 21.87408X + .42546X2 


Standard Error of Estimate for Simple Parabola 

How closely does the simple parabola regression line fit these 
North Dakota wheat yield-cost data? With what degree of 
accuracy could the cost per bushel be computed from the simple 
parabola regression equation? The standard error of estimate is 
the answer. 

The computation of the standard error of estimate by individual 
items requires the substitution of the X- and the X^- values of the 
original data in the equation Y = a+bX + cX^, for every pair of 
the original data. In this case each pair of X and X^ is substi- 
tuted in the equation F = 330.87 — 21. 8741X + .4255X2 as is 
indicated in Worksheet No. 110. 

The correlation coefficient based on the parabola is .97 which 
is much larger than that which is based on a straight line. 

iSf 2 Q4 7444 

= 1 1 - -^^92 = .9408 

p = .97 


In comparison with this result 



r= .79 


625. 

1,599.69 


.39 = 


.61 


This small Sy of 9.74 for the parabola may be compared with 
the much larger Sy of 25.02 for the straight regression line com- 
puted for these same data in Chapter 11. This reduction of 
61.0% in the size of the standard error of estimate indicates that 
the parabola fits the sample much better than does the straight 
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WORKSHEET NO. 110 


Computation of Standard Error of Estimate for Cost of Wheat 
PER Bushel and Yield per Acre for 25 North Dakota Farms 
Based on Simple Parabolic Regression Line 
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regression line. The accuracy with which cost of production per 
bushel can be estimated from the parabola is much greater than 
the estimate from the straight-line trend. 

The free-hand curve through class averages as computed in 
Worksheet No. 107 and shown in Fig. 97 fits these data still 
better. The comparable standard error of estimate based on the 
read-off deviations from the free-hand line is only 5.14 which is 
little more than one-half of that based on the parabola. For 
these particular data the parabola is too extreme a curve. It falls 
too low at the minimum and rises too rapidly as it approaches the 
maximum values of the data. There seems to be no sound theo- 
retical ground for thinking that the cost per bushel of producing 
wheat as related to wheat yields is a true parabolic function. The 
parabola represents the relationship fairly well but not perfectly. 
Since the function is not truly parabolic, the parabola becomes 
only a descriptive device less accurate in this case than the free- 
hand curve through class averages. 

Cubic Parabola 

Since the simple parabola does not measure accurately the re- 
lationship between cost per bushel of producing wheat as related 
to yields, it would be well to try the cubic parabola, which is less 
extreme at those points where the simple parabola missed the 
data farthest. The cubic parabola is a double reverse curve in the 
general form of an irregular letter “S,” lying on its side 

The equation for a cubic parabola is Y = a+bX + cX^ 4- dX^. 

The four normal equations necessary to compute the four con- 
stants a, b, c, and d are: 

Formula No. 93 

(1) 2Y = Na + b'SX -h + dXX^ 

(2) SZF = a2X -h bXX^ + cSX® 4- dSX^ 

(3) XXW == a2X2 4- b2X^ 4- cSX^ 4- dXX^ 

(4) 2X'F = aSX3 4- 62X" 4- c2X5 4- d2X^ 

When reduced to deviations from the means with X® coded as 
V in the formula, the equation becomes 

c = F — hX — cU — dV — Formula No. 94 
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WOREBHEET NO. Ill 

Additional Columns Rdquibed to Compute Cubic Paeabola fob Cost 
PEE Bushel of Wheat and Wheat Yields, 25 Noeth 
Dakota Faems 


u 

V 

UV 

y2 

VY 





X^Y 



59,049 

531,441 

134,136 



32,768 

262,144 

102,400 



100,000 

1,000,000 

156,000 



161,051 

1,771,561 

171,699 



1 39,135,393 

1,291,467,969 

2,515,590 



3,200,000 

64,000,000 

536,000 



45,435,424 

1,544,804,416 

2,515,456 



28,629,151 

887,503,681 

1,995,997 



537,824 

7,529,536 

255,192 

j 


1,889,568 

34,012,224 

437,400 



759,375 

11,390,625 

330,750 



4,084,101 

85,766,121 

676,053 



1,048,576 

16,777,216 

352,256 



248,832 

2,985,984 

186,624 



537,824 

7,529,536 

244,216 



2,476,099 

47,045,881 

459,553 



14,348,907 

387,420,489 

1,141,614 



20,511,149 

594,823,321 

1,463,340 



33,554,432 

1,073,741,824 

2,031,616 



1,419,857 

24,137,569 

422,518 



7,962,624 

191,102,976 

691,200 


. 

5,153,632 

113,379,904 

574,992 



11,881,376 

308,915,776 

843,648 



17,210,368 

481,890,304 

1,075,648 



24,300,000 

729,000,000 

1,620,000 

Sums 12,418 

325,996 

264,677,370 

7,908,790,498 

20,933,898 

Means 496.72 

13,039.84 




Corrections 


161,928,733 

4,250,935,680 

28,074,775 

’Deviations 


102,748,637 

3,657,854,818 

-7,140,877 
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and 

(2x^)b + C^xu)c + (Xxv)d = l^xy 1 

(Xxu)b + (2^2)0 + (Ztuv)d = 'Zuy [ Formula No. 95 

(2xv)b + (ZuV)c + (Xv^)d = Xvy ] 

In the actual solution of the problem for curvilinear data the 
signs of b, c, and d are alternately opposite. If b is plus, c is minus, 
and d is plus. If b is minus, c is plus, and d minus. 

By coding as U and as V and computing the deviations 
from the means the labor of fitting a cubic parabola is reduced to 
a minimum. It is necessary to add only three columns of figures 
to Worksheet No. 109 in order to compute the cubic parabola. 
These columns are X^, X^, and X^F. 


Arithmetic Quantities Required for Cubic Parabola 


2^2 - 1,602 
'Lxu - 67,702 
2 t 42 = 2,934,846 
li^xv = 2,322,349 ^ 
i:,uv = 102,748,637 
2?;2 = 3,657,854,818 
'Lxy = - 6,238 
2^^y - - 232,270 
i:^vy = - 7,140,877 


NOBMAL EQUATIONS AS DEVIATIONS FKOM MEANS 

(1) (2x2)5 + {Xxu)c + {l^xv)d = 2 x 2 / 

(2) (Zxu)b + (Xu^)c + (Zuv)d — Xuy 

(3) (Exv)b + {Xuv)c + ( 22 ; 2 )d = Xvy 

The student should note certain uniformities about these equa- 
tions. 

1 IfXu is the only quantity which does not appear in Worksheets No. 109 
and No. 111. It is computed as follows: 

Zxv = 2X7 - (2X)(7) 

= 9 , 103,116 - ( 520 ) ( 13 , 039 . 84 ) 

= 9 , 103,116 - 6 , 780,716 = 2 , 322,349 



566 


CURVILINEAR REGRESSION 


1. The coefficient; (Xxu), of the first term in the second equa- 
tion is the same as the coefficient, (Zxu)^ of the second term in 
the first equation. 

2. The coefficient, (Zxv), of the first term in the third equation 
is the same as the coefficient (Xxv) of the third term of the first 
equation. 

3. The coefficient, (Zuv), of the second term in the third equa- 
tion is the same as the coefficient (l^uv) of the third term of the 
second equation. 

4. The sums of the squares, Xx^, and of the three 
variables x, u, and v run diagonally through the three equations 
from the first term of the first equation to the third term of the 
third equation. 

5. These relationships hold for these least squares equations be- 
cause of the methods by which they were obtained — that of 
multiplying the original equation for the function desired suc- 
cessively by the coefficients of the unknowns desired, whether it 
be for a, b, c,d, e . . . or any number of them. An understanding 
of these relationships should aid the student in remembering these 
equations and in understanding their functions and meanings. 

DOOLITTLE METHOD OF SOLVING EQUATIONS 

When three or more simultaneous equations are to be solved, 
the ordinary arithmetic methods become long and cumbersome. 
The Doolittle Method systematizes the successive steps in a 
briefer process which is less likely to lead to errors. All students 
v5hould learn it as an efficient economical statistical aid. It may 
seem difficult at first but, if many or long problems are to be solved, 
the time spent in learning it is a great economy. 

For the solution of the equations in the Wheat Cost-Yield 
Problem the steps are: 

1. Substitution of numerical values in terms of deviations from 
the means in the equation 

(1) 1,6026 + 67,702c + 2,322, 349d = - 6,238 

(2) 67,7026 + 2,934,797c -f 102,748, 637d = - 232,270 

(3) 2,322,3496 + 102,748,637c + 3, 657,854, 818c^ == - 7,140,877 
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2. Solution of Normal Equation by Doolittle Method 
WORKSHEET NO. 112 


Forward Solution 


(1) 

1,6025 

+ 

67,702c + 

2,322,349d = 

- 6,238 

(10 

5 

— 

42.260924c - 

l,449.656055d = 

+ 3.89419 

(2) 

67,7025 

+ 

2,934,797c -f- 

102,748,637d = 

- 232,270 


- 67,7025 

— 

2,861,149c - 

98,144,614d = 

+ 263,624 

(SI, 2) 



73,648c + 

4,604,023d = 

+ 31,354 

(20 



— c — 

62.5138904d = 

- .4296926 

(3) 

2,322,3495 

+ : 

102,748,637c + 

3, 657,854, 818d = 

- 7,140,877 


- 2,322,3495 

__ 

98,144,614c - 

3,366, 607,290d = 

+ 9,042,954 



— 

4,604,023c 

287,815,389d - 

- 1,960,060 

(Sl.2.3.) 




3,432,139d = 

- 57,983 

(30 




- d = 

+ .0169 


Bach Solution 


5 

c 

d 

(1) - 3.894190 

(2) +24.499187 

(3) - 62.807226 

+ .4296926 
+ 1.0564847 

- .0169 

- .0169 

1.4861773 


(4) - 42.202229 


Detailed Explanation of Doolittle Solution 

Forward Solution 

1. Bring down equation (1). 

2. Divide equation (1) through by coefficient of h, which is 
1;602 in this case, writing the quotients in appropriate columns 
in line below (1') with signs changed. This is the first derived 
equation, (V). 

3. Bring down second equation (2). 

4. Multiply equation (1) through by the coefficient of the 
second term, (c), in the first derived equation (!'), which is 
“ 42.2609238 in this case, and write the products in the appro- 
priate columns in the line immediately below equation (2). 
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5. Draw a line under this line and add. This result is the sum 
of equations (1) and (2) marked (21, 2), which eliminates the co- 
efficient of b. 

6. Divide the summary equation (21, 2) through by the co- 
efficient of its first term, (c), which is 73,648 in this case, and 
write the quotients in the appropriate columns in the line below, 
with the signs changed. 

7. Bring down equation (3). 

8. Multiply equation (1) through by the coefficient of d in 
equation (!'), which is - 1,449.656055, in this case, and write the 
products in the appropriate columns in the first line below equa- 
tion (3). 

9. Multiply equation (21, 2) through by the coefficient of d in 
the second derived equation, (2'), which is 62.5138904, in this 
case, and write the products in the appropriate columns in the 
second line below equation (3). 

10. Add equation (3) to the two lines below it. This result is 
summary equation (21.2.3). 

11. Divide summary equation (21.2.3) by the coefficient of d, 
which is 3,432,139 in this case, and write the results in the line 
below, with the signs changed. 

Backward Solution or Substitution of Known 
Values to Find Other Values 

1. Write down in line (1) in the appropriate column the last 
value in each derived equation, with the sign changed. In this 
case these values are, under d, — .0169, under c, + .4296926, and 
under b, — 3.894190. 

2. Multiply the next to the last value in the first and second 
derived equations by the known value of d. The computations in 
this case, are (- 62.5138904) X (- .0169) = + 1.056847, and 
(- 1,449.656055) X (- .0169) = + 24.449187. Write these two 
products in line (2) under ^‘Back Solution.^^ 

3. Add the two values in the c column for the value c. In this 
case the computation is 

(+ .4296926) + (+ 1.0564847) - -h 1.4861773. 
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4. Multiply the third from the last value in the first derived 
equation (!'), which is - 42.4609238 in this case, by the value of c. 
The computation is 

(- 42.2609238) X (+ 1.4861773) = - 62.807226. 

Write this product in line (3) under '^Back Solution.^' 

5. Sum the values under b in lines (1), (2), and (3). The total 
is — 42.20229 in this case. 

The back solution is merely a systematic concise method of 
substituting known values for unknowns until all the unknowns 
are computed. First, c is obtained by substituting the known 
value for d. Then b is obtained by substituting the known value 
of d and c. 

Computation of a 

a=Y--bX-cU-dV 

a - 86.12 - (- 42.20223) (20.8) - (1.48618) (496.72) 

- (- .0169) (13,039.84) 

= 86.12 + 874.4861 - 736.3377 + 220.3733 
= 444.6417 

Regression Equation 

Y^aA-bX+cX^ + dX^ 

Y = 444.6417 - 42.2022X + 1.4862X2 .0169X3 


STANDARD ERROR OF ESTIMATE FOR 
CUBIC PARABOLA 

By substituting successively all the quantities for X, X^, and 
X3 of the original data in the regression equation above the es- 
timated values for F may be computed. Here (F — F') = s. 


For the cubic parabola fitted to the wheat cost-yield data 

S, = 7.15 
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TABLE 42 

Comparison of Standard Errors of Estimate 
FOR Four Regression Lines 


Measure Sy 


1 Straight line, least squares = 25.02 

2 Simple parabola, least squares = 9.74 

3 Cubic parabola, least squares == 7.15 


4 Free-hand line through class average = 5.14 


The free-hand line through class averages ‘^fits’^ these wheat 
cost-yield data more closely than any of the mathematically com- 
puted lines. ^ This does not mean that some mathematical line 
might not be fitted which would represent the data better than the 
free-hand curve. Such a line could be obtained but its form and 
function are not immediately evident. A considerable amount of 
study and experimentation would no doubt be necessary to find it. 

SIGNIFICANCE OF REGRESSION LINE 

‘ If it were evident from logical, known relations and careful 
experimentation that some particular mathematical function ac- 
curately represented and measured the cost-yield relationship, it 
would be best to use such a line to express the relationship, even 
though the data in any particular sample failed to conform to 
such a line. If a mathematical function accurately and perma- 
nently measures the relationship between two or more variables, 
the relationship should be expressed in that function and equation. 
The law of falling bodies is an example. The distance a body falls, 
(d), is related to the time in which it falls, {t)j as follows: 

d - 16^2 

In 10 seconds it would fall 1,600 feet, or 

d = 16 X 10^ = 16 X 100 = 1,600 

1 See Ezekiel, M., Methods of Correlation Analysis^ pp. 105-112 for further 
discussion of value of free-hand curves and the methods of determining their 
relative fit to the data as compared with mathematical curves. 
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In 30 seconds it would fall 14,400 feet, assuming no air resistance, 
or - 16 X 302 = 16 X 900 = 14,400 feet. 

The physical sciences are replete with such relationships, but 
the social sciences deal with such complex relationships involving 
so many variables that in most cases as yet no clearly recognized 
mathematical functions have been discovered which logically and 
permanently fit social and economic relationships. It is, perhaps, 



Fig. 97. Comparison of regression lines for wheat yields 
on 25 farms, simple parabola and cubic parabola 
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not too much to hope that in the future more study and experi- 
mentation may reduce some of these relationships to definite 
functions. Until such results are known a mathematical curve to 
such data is scarcely more than a descriptive device. 

Logarithmic Regression Lines 

The logarithms either of F or Z or both may be substituted for 
the actual data in computing a regression line. 

Formula No. 96 

Log F = a -f hX 
Na + 6SZ = S Log F 
aZX + 6SZ2 = SZ Log F 


WORKSHEET NO. 113 


X 

Y 

LogF 

XLogF 

X2 






XX 


XLog Y 

SZ Log Y 

XX2 


Formula No. 97 

F = a + & Log Z 

Na + Log Z = SF 

aS Log Z -b 6S(Log Z)^ = SZF 


WORKSHEET NO. 114 


X 

LogZ 

Y 

FLogX 

(Log Z)^ 







SLogZ 

XF 

XF Log X 

2(LogZF 


Formula No. 98 

Log F = a + 6 Log Z 

Za + 6S LogZ = S Log F 

aS Log Z + 6S(Log Z)^ = 2 Log Z Log F 
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WORKSHEET NO. 115 


X 

LogX 

Y 

LogF 

Log X Log F 

(Log X)2 








2LogX 


2 Log F 

S Log X Log 7 

2(Log X)2 


Many other combinations of log curves and parabolas are 
possible and are used for measuring compound interest and 
growth relationships, but these are rather difficult for an elemen- 
tary course. The Selected References at the end of this chapter 
will supply the student with further information for fitting such 
curves. 


Effect of Reversing Variables on Regression 

In this chapter up to this point we have dealt with the single 
relationship of regression between F as a dependent variable and 
X as an independent variable. For a straight line the equations 
are 


( 1 ) 

( 2 ) 


F == a + byxX — Formula No. 99 

Xa+6,.2X = 2F 1 

a2X + ^.2X2 = 2XF J No. 100 


in which the subscripts yx under hyx mean that this particular b 
measures the regression of F on X. 

If, however, the data were switched, and if in Worksheet No. 38, 
Cost per Bushel became X and Yield per Acre became F, the 
equations and regression line would change to 

(1) X = a + bxyY — Formula No. 101 

(2) aN + bxyZY = 2X 1 Formula No. 102 
a2F + bxySY^ = 2XF j (bxy is read X on F) 


These two regression lines are not identical. In the first case 
we estimate F from X and the standard error of estimate is 
measured in units of F. In the second case we estimate X from F 
and the standard error of estimate is measured in terms of A”". 
The second regression line, X = a + KyY is rarely used or re- 
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quired. It may be computed for later use in computing the co- 
efficient of correlation but in that case it is limited to simple lineai 
correlation which was treated in Chapter 12. 


OTHER CURVE TYPES ^ 

Without illustrating them with problems a few other types of 
curves are presented. Most of these can be reduced to a log- 
arithmic form and computed without serious difficulty by a stu- 
dent who can compute a parabola. 

Compound Interest Curve 
y = P(1 + rY 

r = rate of interest 

X == years (interest compounded annually) 

P = principal 

y = the sum to which the principal amounts at the end of a:-years 

This equation may be stated logarithmically as follows: 

Log y ==logP + X log (1 + r) 


The Exponential Function 
Y = ab^ — Formula No. 103 


If the values of X increase in an arithmetic progression and the 
associated values of Y increase in a geometric progression the re- 
lationships may be expressed by the exponential function. 

It may be expressed in logarithms as: 

Log y = log a + (log h)x — Formula No. 104 


This line may be computed from the following normal equations : 
2 (log y) = N log a -i- log 62 (a;) 

2(:r • log y) = log a2{x) + log bX(x^) 


Formula No. 105 


This curve becomes a straight line when plotted on double log 
paper. It is not suited to time series analysis. 


^ In an elementary text in which only the more simple and widely used 
curvilinear functions and methods are presented, it seems better to include 
the non-linear equations of both non-time series and time series in the same 
chapter than to divide them. 
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Logistic Curve 

Raymond Pearl and L. J. Reed developed the logistic curve 
frequently called the PearhReed growth curve which measures 
or describes the usual expansion of a large population. The 
larger the population, usually, the more accurately this curve 
describes its growth. It conforms quite closely to the population 
expansion of the United States over the past 170 years. It meas- 
ures fairly well the growth of the population of Boston, 1800- 
1940, as shown in.Worksheet No. 116 and Fig. 99. It is not the 
fault of the curve, perhaps, that it does not conform to the popu- 
lation line more closely after 1920. The rapid growth of Boston 
during the half century following the Civil War was somewhat 
beyond a normal long time expansion, and, therefore, some 
slackening of growth was to be expected. But, perhaps, the main 
reason for the divergence is that since 1920 there has been a 
movement of Bostonians to suburban areas which are beyond 
the city limits of Boston and do not show in its census figures, 
although to all practical purposes they are a part of greater Bos- 
ton. If the problem had been based on the metropolitan area 
during the entire period, the curve would have no doubt fit the 
population better. This curve would not likely fit the growth 
of small populations over brief periods because of erratic and 
temporary variations. 

This curve, however, is quite useful in describing the growth of 
industries such as the railways, automobile production, rayon, 
coal, steel, electricity, and the like over a long period of time. 
It is primarily a time series trend device, because the X-variable 
must fall in units of uniform length or size. 

The technique for its computation is simple and the elementary.’ 
student need have no difficulty with it. For a long-time period 
it gives a much better fit for the growth of most population and 
industries than a straight line or a parabola. If, however, the 
period covered is sufficiently long to cover the decline or decay 
of the population after it has passed its peak, the parabola may 
give a better fit, as was indicated in Chapter 15. 
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WORKSHEET NO. 116, PART 1 

Computation op a Logistic Curve for the Population Growth 
OP Boston, Massachusetts, 1800 to 1940, 

FOR Decennial Census Figures 

Years 

y 

Actual 

Population 

10,000,000 

y 

Sub-Totals 

First 

Differences 

1800 

24,937 

401 



1810 

33,787 

296 



1820 

43,298 

231 



1830 

61,392 

163 

Si 1,198 


1840 

93,383 

107 



1850 

136,881 

73 


1 

Cs 

II 

1860 

177,840 

56 


= - 974 ^ 

1870 

250,526 

40 

& 224 


1880 

302,839 

33 



1890 

448,477 

22 



1900 

560,892 

17.8 


di = Ss- Ss 

1910 

670,585 

15.0 


= - 152 

1920 

748,060 

13.4 

8s 72 


1930 

781,188 

12.8 



1940 

770,816 

13.0 




( 


(7- = 


d\ 


Formula No. 106 i 


h = ~ 

(C™ - 1)2 

1 


N 


Si- 


d. 


C”-l 


Substitutions and Solutions 
C'" = C= = ^ = + .15606 
C = -v^. 15606 = .6897 ^ 


^ Computation of C. 

Log C = the logarithm of = 9.19328 — 10 ^ g 3 g 0 gg _ g 

The antilogarithm of 1.838656 ■— 2 is the number .6897. 
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di{C -l) - 974(.6897 - 

° (C" - ly (.15606 - ly 

1) - 974 X - .3103 

(- .84394)2 

+ 302.23 ^ ^24 34 
+.712235 

o - 5 1 1,198 _ 4) } 

= i (1,198 - 1,154) 

= i (44) = 8.8 


WORKSHEET NO. 116 

, PART 2 


2-3 4 5 6 

10,000,000 y' 

X (f he y 10,000,000 

(a + h(f) a + h(f 


1800 

0 

1.0000 

424.3 

433.1 

23,089 

1810 

1 

.6897 

292.6 

301.4 

33,178 

1820 

2 

.4757 

201.8 

210.6 

47,483 

1830 

3 

.3281 

139.2 

148.0 

67,567 

1840 

4 

.2263 

96.0 

104.8 

95,419 

1850 

5 

.1561 

66.2 

75.0 

133,333 

1860 

6 

.1076 

45.6 

54.4 

183,824 

1870 

7 

.0742 

31.5 

40.3 

248,139 

1880 

8 

.0512 

21.7 

30.5 

327,869 

1890 

9 

.0353 

15.0 

23.8 

420,168 

1900 

10 

.0244 

10.4 

19.2 

520,833 

1910 

11 

.0168 

7.1 

15.9 

628,931 

1920 

12 

.0116 

4.9 

13.7 

729,927 

1930 

13 

.0080 

3.4 

12.2 

819,672 

1940 

14 

.0055 

2.3 

11.1 

900,900 


Explanations: 

1. X = powers of c. 

2. c® = 1, since any quantity to the zero power = L 

3. The values are obtained by raising the quantity c = .6897 

to the powers of x. As == .6897 X .6897 = .4757; = .4758 

X .6897 = .3281. 

4. hc^ = (6 • c^) = 424.3 X 1.0000 = 424.3 for 1800, and 424.3 
X .6897 = 292.6 for 1810, etc. 


1 

Year 
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5. The 10,000,000 over y is selected arbitrarily and may be 
any even power of 10 that is sufficiently large that when it is 
divided by the original data will give a whole number without 
the necessity of using small fractions for accuracy, 100,000,000 
would have served just as well or better in this case. A larger 
number would have increased the work without improving the 
accuracy. 

6. The y' trend values are obtained by multiplying the 10,000,000 
used for the first computation by the reciprocal of a + 6c^, or 

1 , . , . . , . . 10 , 000,000 

— riTT? which IS equivalent to r-r-^ — 

a + he ^ a + h(f 

This type of line becomes an elongated S and continues to ap- 
proach a maximum, or ceiling limit as X increases. 



Fig. 98. Logistic curve fitted to the population growth of Boston since 


1800 


SUMMARY 

1. Most relationships between variables in the physical as well as in 
the social sciences are not straight line, but conform to some type of 
curvilinear function. 
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2. Curvilinear lines may be computed from class averages and drawn 
into coordinate graphs of the plotted data. Such curves are often ex- 
cellent descriptive devices and by reading off the estimated values of the 
data from the class averages curved line, values may be obtained com- 
parable to the standard error of estimate and the coefficients of deter- 
mination and correlation. 

3. Mathematical curvilinear lines may be computed from any desired 
formula. The simplest and most commonly used functions are (1) the 
simple parabola, (2) the cubic parabola, (3) the exponential function or 
compound interest curve, and (4) the logistic and other growth curves. 

4. In selecting a mathematical curve for data, a prejudgment must 
usually be made as to the nature of the function which expresses the real 
relationship in the data. 

5. Unless the nature of the function existing in the population is known 
several curves may have to be computed before the correct one is dis- 
covered, or extensive experimentation may have to be done. 

6. If the nature of the function is known, a mathematical line is much 
superior to any freehand or class average line because it makes more 
precise and easily available the information. 

7. Unless a mathematical line does describe the true function, it be- 
comes merely a descriptive device and may be no better than a freehand 
line. 
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REVIEW QUESTIONS 

1. Why do straight-line trends not fit many series of data well? Give 
examples that are not straight-line. 

2. State the procedure in fitting a curvilinear free-hand regression 
line through class averages. 

3. Of what use is a free-hand curvilinear trend line and what are its 
deficiencies? 

4. How may a standard error of estimate be computed for a free- 
hand line? 

5. What are the normal equations required to compute a simple 
parabola? Write them. 

6. What is their form when changed to deviations from the means? 
What economies are effected by this change? Explain. 

7. Explain the required form of worksheet to compute a simple 
parabola from deviations from the means? 

8. What are the advantages of the short cut method over the de- 
tailed method in the computation of the deviations from the mean? 
Explain in detail. 

9. Does a simple parabola fit well many series of data in the social 
sciences? Why? In the physical sciences? Why? 

10. Describe the method of computing the standard error of estimate 
data fitted with a parabola. 

11. What are the differences between a simple parabola and a cubic 
parabola? Explain in detail. 

12. What additions are required to the least squares worksheet of a 
simple parabola in order to compute a cubic parabola? 

13. Write the normal equations for a cubic parabola. 

14. Explain the principles of the Doolittle Method of solving equations. 

15. What is the Back Solution? 

16. What is the difference between the equation, Y == a + hxH and 
the equation, X = a + Y? Explain fully. 

17. Explain why no regression line is complete without its standard 
error of estimate. 




CHAPTER 24 

MULTIPLE AND PARTIAL 
CORRELATION 


All reference to multiple correlation is usually omitted from 
introductory texts as a calculation that is unnecessary or too 
difficult for a first semester's work in statistics. It is, however, 
often quite useful for a student who does not specialize in sta- 
tistics and may be presented in a form that is as easy to compute 
as simple correlation. 

In many statistical problems the relationships among the 
variables are so complex that simple correlation gives a very in- 
complete and inadequate answer to the question. Only mul- 
tiple correlation is sufficient for satisfactory results. In the case 
of the relationship of corn yield and rainfall, the very important 
factors of soil fertility, amount of fertilizer, methods of cultivation, 
time of planting, variety of corn, and amount of sunshine are all 
omitted. In any case corn yield is a resultant of many factors. 
Successive simple correlations between corn yield and each one 
of these independent variables give answers that are both in- 
complete and incorrect. In the method of simple correlation all 
the independent factors except one are eliminated and ignored. 
In multiple correlation all these independent variables may be 
included, measured, and their influence on the dependent factor 
duly apportioned and related. Another reason for presenting an 
introduction to multiple correlation in an elementary text is that 
it aids the student to grasp the total concept of correlation much 
more clearly. 

Although correlation does not prove a causal relationship 
among associated factors, it does measure the degree to which a 

581 
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factor or group of factors is associated with another factor. If, 
for example, the rate of farm income is associated with {1) size 
of farm, (2) number of cows on farms, and (3) wheat yield, it is 
possible to measure the degree of association between rate of 
income and the combined other factors. This is Multiple Correla- 
tion, 

Since the several independent factors do not have equal degrees 
of association with the dependent factor, to obtain a correct 
measure of their total association with it, each one of them must 
be weighted according to its contribution to the total of the inde- 
pendent factors. If the association of wheat yield is twice as 
close as that of size of farm it must have twice as much weight in 
the total degree of association. So it is with all independent 
factors. Each one must be weighted properly in the total result. 
The multiple net regression coefficients are the correct weights, 
because they measure the relationship between the dependent 
factor and each independent factor after the influence of all the 
other factors has been taken into consideration and computed 
out or eliminated. 

To reduce these logical relationships to a mathematical formula 
three steps are necessary: 

1. Compute the deviations from the mean for each variable as 
a measure of the variation in that variable which is to be com- 
pared with the variation of the dependent factor. 

2. Express these summed deviations in terms of their own 
standard deviations in order to reduce them to a common standard 
of comparison, and, 

3. Compute the multiple net regression coefficients as relative 
weights for each variable. 

If the various factors are represented by Xi, X2, X3, X4, • • • 
etc., the deviations from the means will be xi, Xz, X4, • • • etc. 
If these deviations are summed and divided by their own standard 

deviations, the results are ? — j • • • etc. If the 

(T\ (J 2 CTg (7*4 

multiple net regression coefficients are now used as weights of the 
relative importance of each factor they become 
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When the coefficients for the factors are solved by the least 
squares multiple net regression method the correlation equation 
becomes 



in which the correct weight of each independent variable is in- 
cluded in a total, which is expressed as a ratio to the total varia- 
tion in the dependent variable. This ratio is R\ 234 .. . n, or the 
coefficient of Multiple Determination. It expresses as a per- 
centage the amount of the total variation in the dependent varia- 
ble which is associated with the combined variation of the inde- 
pendent variables. If, for instance, R\2u . . n = . 75 , one is to 
understand that 75 % of the variation of Xi is associated with 
the weighted combined variation of Z2 + X3 + X4 + • • • Xn, and 
that 25 % of the variation of Xi is not associated with the inde- 
pendent variables included in this problem, but must be ac- 
counted for by other variables. 

Meaning of symbols: 

Xi = the dependent variable, usually designated by Y in simple 
correlation 

X2, X3, X4, etc., = the several independent variables 

R = multiple straight-line coefficient of correlation 
= multiple straight-line coefficient of determination 

5 1 2 34 . . . n = net regression coefficient between Xi and X2 

513 24 . . n = net regression coefficient between Xi and X3 

614.23 . . . n = net regression coefficient between Xi and X4 

The multiple straight-line regression equation is 

Xi = ai .234 . . n + 612.34 . . «X2 + 613 24 . . nXz + 6 l 4.23 . * . 

+ ...+ 

6lw . . . n— iX^ 



584 MULTIPLE AND PARTIAL CORRELATION 


For four variables the equations are: 

Foemxjla No. 107 

Ctl.234 = Xi — 6i2.34-3l2 — 6i3.24-X^3 &14.23-^4 


Formula No. 108 

2 x 2 ^ 6 x 2.34 “(“ 2X2X36x3.24 "h 2X2X4614.23 = 2 XxX2 
2X2X3612.34 “I" 2 x 3 ^ 6 x 3.24 A- 2X3X4614.23 = 2X1X3 

2X2X4612 34 + 2X3X46x3.24 + 2 x 4 ^ 6 ii .23 = 2X1X4 


All the above formulas may be expanded as desired to include 
more variables. 

Formula No. 109 


>Si^ = 


2^2 

N-M’ 


or 


2 xi2--2(xi')^ 

N-M 


S = check sum of paired variables. 


Multiple net regression equations and coefficients form the basis 
for multiple determination. If the multiple net regression equa- 
tions express straight-line relationships the coeflS.cient of multiple 
determination is based on a straight line relationship and is in- 
dicated by the symbol i2i.234 . . . n. 


Formula No. 110 

aSi^ 

■R^I.234 . . . « = 1 

CTx^ 


If the multiple net regression equations and coefficients are 
curvilinear, the coefficient of multiple determination is based on 
curvilinear relationships and is indicated by the symbol P (Capital 
Rho) 

Formula No. Ill 


The distinction between R and P is quite important. If any 
type of curvilinear regression is used in the solution, the correla- 
tion should be designated with P. 
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BRIEF MODEL WORKSHEET FOR COMPUTING 
MULTIPLE CORRELATION^ 

The three Worksheets No. 117, No. 118, and No. 119, indicate 
the successive steps to be taken in the computations. No. 117 
lists the original data and the computations of the squares of 
each variable and of all required products. A check sum,'' 
designated 5, is included to check and locate any errors in arith- 
metic processes. No. 118 contains the computation of the equa- 
tions in terms of deviations from the Means. No. 119 is the 
solution of the normal equations by the Doolittle Method, which 
is a great economy as the student has already discovered in com- 
puting curvilinear regression lines.^ 


WORKSHEET NO. 117 


Original Data 
Xi Xz Xi 

8 

Extensions with Xi 

Xi^ XiXz XiXi XiS 

Extensions Xz 

Xz^ XzXi XzS 

Extensions Xi 
XiS 

5 

2 

4 

11 

25 

10 

20 

55 

4 

8 

22 

16 

44 

10 

3 

7 

20 

100 

30 

70 

200 

9 

21 

60 

49 

140 

5 

2 

3 

10 

25 

10 

15 

50 

4 

6 

20 

9 

30 

7 

5 

8 

20 

49 

35 

56 

140 

25 

40 

100 

64 

160 

8 

5 

9 

22 

64 

40 

72 

176 

25 

45 

110 

81 

198 

8 

2 

6 

16 

64 

16 

48 

128 

4 

12 

32 

36 

96 

6 

2 

5 

13 

36 

12 

30 

78 

4 

10 

26 

25 

65 

Sums 49 

21 

42 

112 

363 

153 

311 

827 

75 

142 

370 

280 

733 


Means 7 3 6 16 


1. This worksheet is prepared by setting down the original 
data under X2, X3, Xi and adding each line of data crosswise to 
get aS, the check sum. 

2. To obtain Extensions with X2, multiply each line of the 
original data including S by the value of Z2 in that line, thus 
5 X 5 = 25, 5 X 2 = 10, 5 X 4 = 20, 5 X 11 = 55, etc. 

3. To obtain Extensions with X3, multiply each line of the 
original data including S, beginning with X3, by the X3 value, as 
2X2 = 4, 2X4 = 8, 2 X 11 = 22, etc. 

1 These worksheets are adapted from Ezekiel, M., Methods of Correlation 
Analysis, Chapter 12, and Appendix I, pp. 462-4, John Wiley and Sons, New 
York, 1941, and are used with the permission of the author and publisher. 

2 Chapter 23. 
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4. To obtain the Extensions with Xi, multiply the values of 
each line of original data beginning with Xi, by the Xi-value, as 
4 X 4 - 16, 4 X 11 = 44. 

It will be noted that in each successive series of extensions the 
values of the previous X-line are omitted; that is, for Extensions 
of X2 all the values are included. But for Extensions of X3 we 
begin with X3, omitting X2 entirely. For Extensions with Xi, we 
begin with Xi, omitting both X2- and Xs-values. The reasons for 
this are (1) that all the omitted values are included in the previous 
extensions and do not need to be repeated; (2) to omit them 
greatly reduces the size of equations and work of solution. 


WORKSHEET NO. 118 




X, 

X3 


s 

Line 


Extensions 

X, 

363 

153 

311 

827 

1 


Corrections 


343 

147 

294 

784 

2 


Deviations 


20 

6 

17 

43 

3 

Equation I 

Extensions 

X, 


75 

142 

370 

4 


Corrections 



63 

126 

336 

5 


Deviations 



12 

16 

34 

6 

Equation II 

Extensions 




280 

733 

7 


Corrections 




252 

672 

8 


Deviations 

Xi 



28 

61 

9 



Worksheet No. 118 is constructed as follows: 

1. The totals of the Extensions with X2 are set down as line 1. 

2. The corrections for deviations from the mean are computed 
by multiplying all the totals of Original Data including S by the 
mean of X2, as 7 X 49 = 343, 7 X 21 = 147, 7 X 42 = 294, 
7 X 112 - 784. 

3. Subtract the corrections from the extensions (line 2) from 
(line 1) and the result is (line 3) or Equation I in terms of devia- 
tions from the mean. 

4. Extensions with X3 in Worksheet No. 117 are set down in 
line 4 of Worksheet No. 118 and the corrections computed as 
follows: 3 X 21 = 63, 3 X42 = 126, 3X 112 = 336. Subtract 
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the corrections (line 5) from the extensions (line 4) and the re- 
sult is Equation II in line 6. 

5. Repeat this process for each series of extensions. 

WORKSHEET NO. 119 


Doolittle Solution of Regression Equations 



X 2 

X, 

Zx 

Check 

Sum 

Equation I 

20612 3 

+ 66132 = 

17 

43 

Equation V 

• — 1612.3 

“■ ‘ 36 i 3.2 = 

- .85 

- 2.15 

Equation II 

(+ 66123) 

+ 12613 2 = 

16.0 

34 

Equation I X .3 

(— 6612.3) 

— 1.86132 ~ 

- 5.1 

- 12.9 

Equation II' 


10 . 261, ,2 = 

“ 1 613,2 = 

10.9 
- 1.07 

21.1 
- 2.07 


Back Solution 


&12 3 & 13.2 

.85 1.07 

.529X2 

u = Xj — h\%K 2 — 513X3 


= 6 - (.529 X 7) - (1.07 X 3) 

= 6 - 3.703 - 3.21 
= 6 - 6.913 
= - .913 

Xi = a “h 612X2 -f* 613X3 

= - .913 4- .529 -h 1 . 07 X 3 

The Back Solution is only a short method of substitution. The 
values under Xi in the prime equations I', II', — 1.07 and — .85 
are written under 613.2 and 6123 with the signs changed. The 
value of 613.2(1.07) is then multiplied by the coefficient of 613.2 
which is 0.3 in Equation T and written in the back solution under 
612. The values .85 and — .3021 are added which gives the value 
of 6]2 or .5479. The student should become thoroughly familiar 
with this short method of substitution. 
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Multiple Determination 
Formula No. 112 

hl2.z(^XiX2) + hiZ2(^XiXz) 




This is merely an extension of Formulas No. 51 and No. 52 
to include two or more independent variables. For only two 
variables the formula may be stated as 


or 


7*2 == 

= hxy 


'yx ^zy) 

(^^y) 

(2f) 


For two or more independent variables the formula becomes: 




which is 


i^f) 

D2 bn-ai^XiXi) + bis 2(11X1X3) 

^ w 

.529 X 17 + 1.07 X 16 
28 

8.993 + 17.12 


28 

26.113 


28 


= .933 


By cutting the numerator of the second term into as many 
units as there are 6’s with the common denominator under each 
one, we arrive at the formulas for 

Separate Determination 

Formula No. 113 

, _bi2s(^XiX2) 8.993 

=-^ = -3212 
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Formula No. 114 


hi^(Z,xiXz) _ 16.112 


'Zx-? 


28 


= .6114 


By this method the total determination = .933 is divided 
into, the separate determination of di^.^ === .3212 and diz ,2 = .6114. 
These coefficients signify that X 2 and Xz together account for 
93.3% of the variation of Zi, and X2 accounts for .3212 of this 
total and X3 accounts for .6114 of the total. This computation 
is useful in that it measures both the total and the separate deter- 
mination. 


Standard Error of Estimate for Multiple Determination 
Formula No. 115 

N-M 

_ 28 - 26.4343 
7-3 

^ 1.5657 
4 

= .472 

Formula No. 116 

-Si' = o-i2(l - m ) <71^= ^ = 4. 

= 2V(1 - .933) 0-1 = 2. 

= 2V{mr) 

= 2. X .24 
Si = .516 

The total deternaination, .933 is the portion of the variation of 
Zi that is explained by Z2 and Z3 together. The standard error 
of estimate, .067 is the portion of Zi that is not explained by X2 
and Xz. In other words, Xz and Xz account for 93.3% of Zi. 
Z2 accounts for 32.12% of Zi, and Z3 accounts for 61.14% of 
Zi. The remaining 6.7% of Zi has yet to be explained by other 
variables. 



590 MULTIPLE AND PARTIAL CORRELATION 


PRACTICAL EXAMPLE OF MULTIPLE CORRELATION 
APPLIED TO ECONOMIC DATA 

It is well known by practical businessmen, as by professional 
economists, that the profits of any particular business venture or 
unit are affected by and associated with many diverse factors. 
The profits of a drug store may rest on (1) location, (2) delivery 
service, (3) quality of goods sold, (4) dependability of pharmacist, 

(5) advertising, (6) efficiency of management, (7) personality of 
clerks, and many other factors. These factors may not all be 
equal in importance. 

The profits of a factory may be associated with (1) loyalty of 
labor force, (2) nearness to raw materials, (3) cheapness of trans- 
portation, (4) nearness to consumer market, (5) credit standing, 

(6) good will, and many other factors of varying strength and 
weight. 

The relationships treated in the sciences of sociology, psy- 
chology, education, biology, business management, as well as 
economics are all measurable by multiple correlation insofar as 
they may be stated in quantitative or measurable terms. Di- 
vorces may be associated with (1) economic conditions of the 
home, (2) different ages of couple, (3) different religious views, 
(4) sex inhibitions, and many other factors. The achievement of 
students in school may be due to (1) I.Q., (2) home environment, 
(3) health, (4) quality of teachers, (5) interest in subject, and 
scores of other factors of various weights. Educators have made 
extensive use of correlation in educational research, but these 
problems are so complex that only multiple correlation is ade- 
quate to treat many of them. The student will readily recall 
many other fields of study and particular cases in which the 
powerful statistical analytical device of multiple correlation may 
be used. 

The following problem on the rate of return on nineteen mixed 
wheat and dairy farms of various sizes is a good illustration of 
the detailed procedures and methods for the computation of exten- 
sions for variables. 
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WORKSHEET NO. 120 


Computation op Extensions for Four Variables 
Using Check Sum^ 


Variables 

Extensions with X 2 


Xs 

X4 

Xi 

S 

Xa^ 

X2X3 

X 2 X 4 

X 2 X 1 

X 2 S 

220 

9 

16 

3.5 

248 5 

48.400 

1,980 

3,520 

770 0 

54,670.0 

325 

16 

20 

7.2 

368 2 

105,625 

5,200 

6,500 

2,340 0 

119,665.0 

520 

8 

17 

9.0 

554 0 

270,400 

4,160 

8,840 

4,680 0 

288,080 0 

400 

13 

16 

4.2 

433 2 

160,000 

5,200 

6,400 

1,680 0 

173,280 0 

285 

18 

27 

16.9 

346.9 

81,225 

5,130 

7,695 

4,816 5 

98,866 5 

297 

14 

22 

9.7 

442 7 

88,209 

4,158 

6,534 

2,880.9 

101,781 9 

307 

11 

20 

52 ’ 

343.2 

94,249 

3,377 

6,140 

1,596 4 

105,362 4 

215 

8 

20 

6.5 

249 5 

46,225 

1,720 

4,300 

1,397.5 

53,642 5 

220 

1 

20 

8.3 

249 3 

48,400 

220 

4,400 

1,826.0 

54,846 0 

255 

9 

22 

10.0 

296 0 

65,025 

2,295 

5,610 

2,550.0 

75,480 0 

225 

3 

20 

5 2 

253.2 

50,625 

675 

4,500 

1,170.0 

56,970 0 

400 

8 

25 

14 6 

447 6 

160,000 

3,200 

10,000 

5,840.0 

179,040 0 

135 

3 

20 

5 1 

163 1 

18,225 

405 

2,700 

688.5 

22,018,5 

148 

7 

15 

40 

174 0 

I 21,904 

1,036 

2,220 

592 0 

25,752,0 

360 

4 

22 

62 

392 2 

129,600 

1,440 

7,920 

2,232.0 

141,192 0 

280 

6 

20 

10 2 

316 2 

78,400 

1,680 

5,600 

2,856.0 

88,536 0 

480 

6 

25 

7.4 

518 4 

230,400 

2,880 

12,000 

3,552 0 

248,832.0 

305 

7 

18 

56 

335 6 

93,025 

2,135 

5,490 

1,708 0 

102,358 0 

400 

9 

23 

10 6 

442 6 

160,000 

3,600 

9,200 

4,240 0 

177,040.0 

6,777 

160 

388 

149 4 

6,474 4 

1,949,937 

50,491 

119,569 

47,415 8 

2,167,412 8 



Extensions with . 

X3 

Extensions with X4 

Extensions with Xi 

X32 

X3X4 

X3X1 

XzS 

X42 

X4X1 

xs 

Xl2 

XnS! 

81 

144 

31,5 

2,236 5 

256 

56 0 

3,976.0 

12 

.25 

869 75 

256 

320 

115 2 

5,891 2 

400 

144 0 

7,364 0 

51 

84 

2,651 04 

64 

136 

72 0 

4,432 0 

289 

153 0 

9,418 0 

81 

00 

4,986.00 

169 

208 

54 6 

5,631 6 

256 

67.2 

6,931.2 

17 

64 

1,819 44 

824 

486 

304.2 

6,244 2 

729 

456.3 

9,366.3 

285 

61 

5,862.61 

196 

308 

135.8 

4,797 8 

484 

213 4 

7,539.4 

94 

09 

3,324 19 

121 

220 

57 2 

3,775 2 

400 

104 0 

6,864.0 

27 

04 

1,784 64 

64 

160 

52 0 

1,996.0 

400 

130 0 

4,990.0 

42 

25 

1,621 75 

1 

20 

83 

249 3 

400 

166 0 

4,986.0 

68 

89 

2,069 19 

81 

198 

90.0 

2,664 0 

484 

220 0 

6,512.0 

100 

00 

2,960 00 

9 

60 

15 6 

759 6 

400 i 

104 0 

5.064.0 

27 

04 

1,316 64 

64 

200 i 

116 8 

3,580 8 

625 

365 0 

11,190 0 

213 

16 

6,534 96 

9 

60 

15 3 

489 3 j 

400 

102 0 

3,262 0 

26 

01 

831 81 

49 

105 

28 0 

1,218 0 

225 

60 0 

2,610 0 

16, 

.00 

696 00 

16 

88 

24.8 

1,568 8 

484 

136.4 

8,628 4 

38 

44 

2,431 64 

36 

120 

612 

' 1,897 2 

400 

204.0 

6,324 0 

104 

04 

3,225 24 

36 

150 

44 4 

3,110 4 

625 

185 0 

12,960 0 

54 

76 

3,836.16 

49 

126 

39 2 

2,,349 2 

324 

100.8 

6,040 8 

31 

36 

1,879 36 

81 

207 

95 4 

3,983 4 

529 

243.8 

10,179 8 

112 

36 

4,691 56 

1,706 

3,316 

1,361 5 

56,874 5 

8,110 

3,210 9 

134,205 9 

1,403 

78 

53,391 98 


1 This table is adapted from Mordecai Ezekiel, Methods of Correlation 
Analysis, p. 360, and used by permission of John Wiley and Sons, Inc., New 
York. 
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SHORT METHOD OF COMPUTING EXTENSIONS 
IN WORKSHEET NO. 120 

1. Set the data down in the first columns under the heading of 
^‘variables.” 

2. Add the data in each line crosswise for the check sum. See 
that the total of the check sum column exactly equals the totals 
of all the variables. 

3. For Extensions with X2, multiply all the items in each line 
by the X2 value in that line. 

4. For Extensions with Xs, multiply all the items in the line 
beginning with column Xz by the value of the Xz column. 

5. In each succeeding set of extensions begin with the column 
of data for which extensions are being computed. That is, the 
Extensions with X4 begin with column X4, Extensions with X5 
begin with column X5, and so on to Xi, which is always the 
last. 

The check sum column in each set of extensions checks all the 
totals for that variable. Check sum X2S should equal the sum of 
columns containing an X2. The check sum XzS should equal the 
sums of all columns containing an X3, and so on for all variables. 


WORKSHEET NO. 121 

Computations of Deviations from Means with Check Sum 



Xa 

X 3 

X 4 

Xi 

S 

Line 

Sums 

5,777. 

160 

388 

149.4 

6,474 4 

1 

Means 

304.05263 

8.42105 

20.42105 

7.86316 

340.75789 

2 

Extensions X 2 

1,949,937. 

50,491 

119,569. 

47,415 8 

2,167,412 8 

3 

Corrections 

1,756,512 04 

48.648 42 

117,972.42 

45,425.46 

1,968,558 35 

4 

Deviations of X 2 

193,424 96 

1,842.58 

1,596.58 

1,990 34 

198,854 45 

5 

Extensions Xz 


1,706. 

3,316. 

1,361 50 

56,874 50 

6 

Corrections 


1,347 37 

3,267 37 

1,258 10 

54,521.25 

7 

Deviations of Xz 


358.63 

48 63 

103.40 

2,353 25 

8 

Extensions Xt 



8,110 

3,210 90 

134,205 90 

9 

Corrections 



7,923 37 

3,050 90 

132,214.05 

10 

Deviations of X 4 



186.63 

160.00 

1,991 85 

11 

Extensions Xi 




1,403.78 

53,391 98 

12 

Corrections 




1,174 76 

50,909.24 

13 

Deviations of Xi 




229 02 

2,482 74 

14 
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The corrections in Worksheet No. 121 are computed by mul- 
tiplying all the line Sums through successively by the Means. 
For the corrections for X 2 all the sums are multiplied by the 
mean of X 2 . These multiplications are (5,777) (304.05263) and 
(160) (304.05263) and (388) (304.05263) and so on to include the 
check sum total. For the corrections for X 3 the sums beginning 
with X 3 (160 in this case), are multiplied through by the mean of 
X3. These computations are (160) (8.42105) and (388) (8.42105) 
and so on to include the check sum. The extensions with X2, X3, 
and all the little a;^s, or deviations from the mean, are obtained by 
subtracting the corrections from the Extensions in the line above. 

WORKSHEET NO. 122 


Solution of Noemal Equations by the Doolittle 
Method with Check Sum * 


Line 

Z2 

X3 

X 4 

Xi 

s 

I 

193,424.96 

1,842.58 

1,596.58 

1,990.34 

198,854.45 

r 

- 1.00000 

- .00953 

- .00825 

- .01029 

- 1.02807 

II 

(1,842.58) 

358.63 

48.63 

103.40 

2,353.25 


1,842.58) 

- 17.56 

- 15.22 

- 18.97 

- 1,895.08 

2 


341,07 

33.41 

84.43 

458.17 

IF 


- 1.00000 

- .09796 

- .24754 

- 1.34333 

III 

(1,596.58) 

(48.63) 

186.63 

160.0 

1,991.85 


(- 1,596.58) 

(- 15.20) 

- 13.17 

- 16.42 

- 1,640.55 



(- 33.41) 

- 3.27 

- 8.27 

- 44.88 

3 



170.19 

135.31 

306.42 

iir 



- 1.00000 

- .79505 

- 1.80045 


Bach Solution 


&12.34 

&13.24 

614 . 23 

.01029 

.24754 

.79505 

~ .00656 

- .07788 

.79505 

- .00162 

.16966 


.00211 




See explanation of Doolittle Method in Chapter 23. 
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MEANING OF NET REGRESSION COEFFICIENTS 

A regression coefficient computed for two variables only leaves 
entirely out of the picture the effects of all other independent 
variables. If a regression coefficient is computed for wheat yields, 
(X 2 ), and income, (Xi), alone, no account is taken of the in- 
fluence on income of such evidently present factors as (Z3) size 
of farm, {XI) rainfall, (X5) fertilizer, (Xe) man power, and many 
other factors. These other variables are simply excluded from 
the problem. This elimination of vital factors usually results in 
over-magnifying the coefficient of the variable which is included. 
When these other factors are included in the problem and multiple 
net regression equations are computed for all of them at once, 
each net regression coefficient is corrected for the influence of 
the others by the mathematical process of holding all the others 
constant while the value of this particular one is computed. In 
the first case, the other independent variables are simply elimi- 
nated. In the second case, they are all included in the problem 
and adjustments are made for their separate effects. The latter 
method, the use of multiple net regression equations and coeffi - 
cients, gives a much more accurate analysis. 

For the variables and equations developed in Worksheets 
No. 120 and No. 121 , for income (Xi), size of farms (X2), number 
of cows on farms (X3), and yield of wheat per acre (X4), the dif- 
ferences between the two methods are as follows: 

1 . 612 (the regression coefficient between income and size of 
farm alone) = .01029. 

& 12.34 (the net regression coefficient between income and size 0 ^ 
farms after the influence of the number of cows, (X3), and wheat 
yields, (X4) has been removed) = . 00211 . 

The net regression coefficient is only about one-fifth as large as 
the simple coefficient. The simple coefficient, ( 612 ), was mag- 
nified beyond its true size by the influence of X3 and X4, which 
was not computed out. 

2 . hiz (the regression coefficient between income and number oi 
cows on farms) = .2883. 
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fei3 24 (the net regression coefficient between income and number 
of cows on farms after the influence of the size of farms, (Z2), 
and wheat yields, (X4), has been removed) = . 16966 . The net 
coefficient is less than two-thirds as large as the simple coefficient. 
In bi3 the true relationship is obscured by the influence of X2 and 
X4 which had not been computed out. 

3 . 5i 4 (the regression coefficient between income and wheat 
yield per acre) = . 8573 . 

614 23 (the net regression coefficient between income and wheat 
yields after the influence of size of farms, (X 2 ), and number of 
cows on farms, (X 3 ), has been removed) = . 79505 . The net re- 
gression coefficient is only eleven-twelfths as large as the sim- 
ple coefficient. The multiple net regression coefficients appear at 
the end of the back solution in Worksheet No. 122. This ex- 
ample is a fair illustration of the principle stated in the preceding 
paragraph, a simple regression coefficient tends to obscure the true 
value of the relationship between two variables. The multiple net 
regression coefficients approach more nearly the true relationship. 

The coefficient bn expresses an obscured and usually mag- 
nified relationship between Xi and X2. The net regression co- 
efficient 6123 measures the relationship between Xi and X2 after 
the influence of X3 is taken into consideration and its weight 


WORKSHEET NO. 123 

Final Steps in Solution of Multiple Correlation Problem, 
Check, Computation of and of ^ * 


Vari- 

ables 

Regres- 

sion 

Coeffi- 

cients 

Equation 

III 

Check. 

( 1 X 2 ) 

Equation' 

Compu- ! 
tation 
of E 2 

Means 

Compu- 
tation 
of A 


1 

2 

3 

4 

5 

6 

7 

X 2 

.00211 

1,596.58 

3.36878 

1,990.34 

4.19962 

304.05 

.6415 


.16966 

48.63 

8.25057 

103.40 

1 17.54284 

8.42 

1.4285 

X 4 

.79505 

186.63 

148.38018 

160.00 

! 127.20800 

20.42 

16.2349 

Sums 


160.00 

159.99953 

229.02 

* 148.95046 

7.86 

18.3049 


* Adapted from Mordecai Ezekiel, Methods of Correlation Analysis^ and 
used by permission of John Wiley and Sons, Inc., New York. 
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removed from the effect of X 2 on Xi. In the still more complex 
net regression coefficient, 612345 , the net relationship between X 2 
and Xi is computed after the effects of X 3 , X 4 , and X 5 have been 
brought into the problem and their effects on the relationship be- 
tween X 2 and Xi have been computed out. The fact that the 
influence of the other independent variables has been removed in 
each case is the reason such coefficients are called net regression 
coefficients. They are much more exact measures of the relation- 
ship between variables than the simple two variable coefficients. 


Cll .2 U — Xi — 612.34X2 — 613.24X3 ^ 61423X4 

= 7.86 - 18.3 = - 10.44 

Xi == ai .234 + 612.34X2 -h 613 24X3 + 614.23X4 

= - 10.44 + .00211X2 + . 16966 X 3 + . 79505 X 4 

^XiX^hii 34 4 ” ^XiXzhiZ 24 + ^XiXAiA.IZ 


m = 


4.19962 + 17.54284 + 127.20800 


R 


148.95046 

229.02 

.8065 


229.02 

= .6504 and 


STANDARD ERROR OF ESTIMATE FOR MULTIPLE 
REGRESSION 


Formula No. 117 


— (612 34Sxia;2 + 613 242x10:3 + 614 232x1X4) 

N-M 

Exi^ - 2xi'2 

N-M 


229.2 - 148.95 
19-4 
80.25 


15 

^Si = 2.32 


= 5.36 


Since R\ 234 = the percentage of the total variation in Xi that 
is associated with all the independent variables combined, 
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1 = the percentage of variation in Xi that is not explained 

by X2 + X3 + X4, etc. 

= total amount of variation in Xi. 

(6i2.34Sa:iX2 + biz. 2 ^xiXz + 614.232x10:4) = total amount of varia- 
tion in Xi that is explained by or associated with variation in 
X2 + X3 + X4, etc. Therefore, 

2xi ““ (612.34SX1X2 + 613.242x1X3 + 614 232x1X4 + etc.) 

is the portion of variation in Xi that is not associated with the 
included independent variables. In this problem these values are: 

Sxi^ = 229.02 total variation to be explained 

(612.342x1X2 + etc.,) = 148.95 total amount of variation explained 
by X2, X3, and X4 

NSi^ “ 80.25 the amount of variation not explained 
e , _ 80.25 80.25 80.25 ^ 

N-M 19-4" 15 

Si =2.32 


CORRECTION OF COMPUTATIONS FOR SIZE OF SAMPLE 

Ezekiel develops the following set of corrections for coefficients 
computed from small samples. The standard error for multiple 
correlation increases (1) with the increase in the number of vari- 
ables employed and also (2) with the decrease in size of samples. 

N = the number of paired items of data used 
M = the number of variables included 


Following the rule for degrees of freedom the correlation becomes : 


X- 1 
N-M 


which in this case is 


19-1 

19-4 


15 ' 


Applied to = .6504 the coefficient of multiple correlation 


corrected for size of sample and number of variables becomes: 


» - 1 - (1 - .6504) II Formdla No. 118 

-(I-.4195) _ 

= .5805 and i? = .762 ^ ^ 
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PARTIAL CORRELATION 


Multiple correlation measures the total combined association 
of the independent variables with the dependent variable but 
gives no measure of the relative importance of the several inde- 
pendent factors. In the problem above R\ 234 = .6504, but this 
does not explain how much of this total is due to ( 1 ) wheat yield, 
(2) number of cows, (3) size of farm. This additional information 
is usually desirable and sometimes necessary, but it must be ob- 
tained from additional computations. 

In this text three methods are presented for measuring the 
association of each independent variable with the dependent 
variable after the influence of the other factors is removed. 

1 . Separate determination is given in formulas No. 113 and 
No. 114, For the present problem the values for the three vari- 
ables are: 


di2M 

dl3.24 

dl4.23 


bi2 zIEXiX2. 
2 :^ 1 ^ 

2x1^ 

bu 2 Z^XiX4 

2 xr 


4.1996 

229.02 


= .0183 


17.5428 

229.02 


= .0766 


127.208 

229.02 


= .5554 


iJh .234 = .6504 = Total 


Separate determination has the advantage that the separate 
determinations exactly equal the total determination, but it has 
the great disadvantage that it is erratic and undependable when- 
ever there is a high degree of inter-correlation between two or 
more of the independent variables. Also, its standard error is 
not determinable.^ 

2. Partial correlation is the standard and most widely used 
method of measuring the separate values of several variables. 
Ezekiel gives, perhaps, the best definition; “The coefficient of 
partial correlation may be defined as a measure of the extent to 
which that part of the variation in the dependent variable which 
was not explained by the other independent factors can be ex- 

1 Mordecai Ezekiel, Methods of Correlation Analysis, pp. 498-99. 
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plained by the addition of the new factor/^ ^ If (1) size of farm 
and (2) wheat yields explain .5943 of the variation in farm in- 
come, how much more of the variation would be explained by 
including (3) the number of cows? .4057 is yet unexplained. 
Including cows raises the total determination to .6504, w^hich is 
(.6504) - (.5947) = .0561. Adding the variable on cows ex- 
plained .0561 of the yet unexplained variation in Xi of .4057; 

= .1383 the amount of additional explanation of variation 

.4057 


supplied by including cows. 

The formulas for partial correlation for three independent 
variables are^* 



in which the numerator of the fraction, (1 — remains the 

same for all equations, but the denominator varies for each com- 
putation. The denominator in the equation for each variable is 
1 minus the multiple determination coefficient for all the dependent 
factors except the one being measured. For the multiple 

determination is which omits X 2 . For r\zu this multiple 

^ Ihid., p. 214. Quoted by permission of the author and John Wiley & 
Sons, Inc. 

2 There is another method for computing partial correlation based on 
building up the partial coefficients from successive simple correlation coeffi- 
cients which has been in wide use for many years. It is not presented here 
because it reaches the same results obtained by the shorter method given 
below. The student may find it in many of the more advanced texts or older 
books as Mills, F. C., Statistical Methods^ Revised^ pp. 554-60, or Pearson 
and Bennett, Statistical Methods Applied to Agriculture, p. 191. 


600 MULTIPLE Am PARTIAL CORRELATION 


determination is R\.u which omits X 3 . All the formulas follow the 
same principle* 

To compute the partial correlation for each factor it is necessary 
first to compute the several multiple determinations successively 
eliminating the desired factor. This result requires the simul- 
taneous solution of the required equations for the new regression 
coefficients. This computation is neither long nor difficult, be- 
cause the same equations developed in Worksheet No. 121 , for 
the original total multiple determination may be used by omitting 
successively one term, as 

for Src 3 ^ 6 i 3 4 + == 'SxiXz 

4 + 2a;4^6i4.3 = '^XiXa 
for R\ .24 ^Xi^bi^A “h 2X2^4^14.2 = 2XiX2 

jS X^XaI^i ^ . A “t” ■2x4^fel4,2 — !2 xiX4 

for R^ 1.23 + &12.3 "I" 2a;22J36l3.2 = SXlXs 

Sa:2a;36i2.3 4" Sa;s^&i8 2 = 2/XiXs 

The equations to be taken from Worksheet No. 121 for solu- 
tions are: 

f (1) 358.63 + 48.63 = 103.40 
\ (2) 48.63 -I- 186.63 = 160.00 
/ (1) 193,424.96 4- 1,596.58 = 1,990.34 
1 (2) 1,596.58 4- 186.63 = 160.00 

„ f (1) 193,424.96 4- 1,842.58 = 1,990.34 
1 (2) 1,842.58 4- 358.63 = 103.40 


WORKSHEET NO. 124 
Solution of Simultaneous Equations 
12^1.34, R\ m , and E^i .23 


RhM 




I 

358.63 

48.63 = 

103.40 

I' 

~ 1.0000 

~ .135599 = 

- .288319 

II 

48.63 

186.63 = 

160.00 


- 48.63 

- 6.59418 = 

- 14.02094 

1,2 


180.03582 - 

145.7906 

ir 


~ 1.000 = 

- .810833 
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Bach Solution 


5i3 4 

5i4.3 

.288319 

.810833 

- .109948 

.810833 

.178371 



Regression 

Coefficients 

Equation I 

Computation of Rh.u 

Z3 .178371 

103.40 

18.444 

' _ 148.177 

^ 229.02 

X4 .810833 

160.00 

129.733 


229.02 

148.177 

= .6470 


7 *^ 12,34 = 1 ~ 


(1 - .6504) 
(1 --- .6465) 


.3496 

.3535 


- 1 


- .9889 = .0111 


This coefficient of partial correlation is 1.11% of the unexplained 
determination before size of farms was included in the problem. 
In this sample, size of farm, X 2 , has little association with rate of 
income. 


-K^I.24 


WORKSHEET NO. 125 
Front Solution 


I 193,424.96 
I' - 1.0000 
II 1,596.58 
- 1,596.58 


+ 1,596.58 

- .0082543 

186.63 

- 13.17865 


1,990.34 
= - .01029 

160.00 
= - 16.42886 


1,2 173.45135 = 143.57114 

II' = - .806132 


Bach Solution 


hi2i 

.010290 
- .006554 
.003736 


^14 2 

.806132 

.806132 
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Regression 

Coefficients 

Equation I 

Computation of R\, 2 i 

Z 2 .003636 
Xi .806132 

1,990.34 

160.00 

7.2369 136.2180 

128.9811 229.02 


229.02 

136.2180 = .5943 

r\i 24 = 1 

(1 - .6504) 

(1 ~ .5943) “ ^ 

- .4oS - ^ 


This coefficient of partial correlation is 13.83% of the unex- 
plained determination before number of cows was included in the 
problem. In this sample number of cows on farms does not have 
a large degree of association with rate of income, but it is much 
more important than size of farms. 


WORKSHEET NO. 126 


I 

r 

II 

193,424.96 + 

- 1.0000 - 
1,842.58 + 

- 1,842.58 - 

1,842.58 = 
.0095426 = 

358.63 = 

17.5526 = 

1,990.34 
- .01029 

103.40 
- 18.9062 

1.2 

II' 


341.0774 = 

1.0000 = 

+ 84.4398 
- ,247567 


Back Solution 



5i2 3 

5i3 2 



.01029 
- .002358 

.007932 

.247567 



.247567 


Regression 

Coefficients 

Equation I 

Computation of R\ 23 

Z 2 .007932 

X 3 .247567 

1,990.34 

103.40 

229.02 

15.78738 

25.59843 

41.38581 

41.38581 
^ ‘ '' 229.02 

= .1806 

»'^4.23 = 1 

(1 - .6504) 

(i - .1806) ^ 

.3496 
.8194 " ^ 

.4266 = .5734 
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This coefficient of partial correlation is 57.34% of the unex- 
plained determination before wheat yields were included in the 
problem. In this sample wheat yield is by far the most important 
factor associated with rate of income. 

BETA COEFFICIENTS 

As was stated earlier in this chapter it is possible to measure the 
degree of association between the dependent and each of the inde- 
pendent variables by weighting each independent factor with its 
multiple regression coefficient. This measure requires, however, 
that the value of each factor be stated in standard measures/' 
or in terms of its own standard deviations. The computation re- 
duces the variance in all variables to a common measure or a 
kind of common denominator. When the variation in each of the 
Xi, X2, Z3, • • • Xn is reduced to di, cr2 , (Ts, • * • (Jn, the ratios 
between the standard deviations weighted by the regression co- 
efficients are called the coefficients and are expressed as 

follows: 

= / 3 i 2.34 + ^ 13.24 + ^ 14.23 “ + Cls' j 

(Jl (T 2 CTz 0^4; 

and the partial betas as 

/ 3 i 2 34 j etc. 

(^i 

The computations for this problem are, by Formulas No. 122, 
123, and 124, 

Pl2S4 &12 34 ^^ .00211^^2 gQg .0613 
A 3.24 = &13.24- = .16966 ^ = .212 

Pi4 2 s ^14 23 ^^ .79505 .7163 

The squared betas are: 



Beta 

Squared Beta 

/5i2 34 

.0613 

0038 


.2120 

.0449 

23 

.7163 

.5109 


»14 23 
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TABLE 43 

Comparison of Separate Determination, Partial 
Correlation, and Beta Coefficients 



Variables 

d 

Partial 

Betas 

Betas 

Squared 

X2 

Size of Farms 

.0183 

.0111 

.0613 

.0038 

X3 

Number of Cows 

.0766 

.1383 

.2120 

.0449 

Z4 

Wheat Yields 

.5554 

.5734 

.7163 

.5109 


Although, these several measures for each variable are not 
identical they rank in the same order in each case, and, in fact, 
tell the same story. They check and corroborate each other. 

MULTIPLE CURVILINEAR CORRELATION 

The formulas and methods presented in this chapter up to this 
point are all straight-line equations and methods. As has already 
been shown many relationships are curvilinear. Straight-line 
methods do not give the best results when the relationships are 
truly curvilinear. 

Mathematical Curvilinear Methods 

Several formulas for curvilinear regression were presented in 
the previous chapter. Any combination of them may be used in 
curvilinear correlation problems. The straight-line multiple re- 
gression line 

Xi = a + & 12.34 . . . 71X2 + 613.24 . . . nXz + • • • bln . . . N—i 

may be expanded to fit any set curvilinear factors by combinations 
such as the following examples: 

Example 1 

1. Relationship between Xi and Z2 simple parabola 

2. Relationship between Xi and X3 straight line 

3. Relationship between Xi and X4 cubic parabola 
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Such a regression line would be: 

Xi + U + &12.34 . . . 71-^2 + b'i2 34 . . w-X’2^ + &13.24 . . 

“h bl4 23 . . . n-^4 H“ b'u,23 . . . 71 X 4 ^ + b"u 23 . . . 

Such an equation contains seven unknowns and is very long 
and tedious to compute. 


Example 2 

1 . Relationship between Xi and X2 parabolic log line 

2. Relationship between Xi and Xs cubic log parabola 

3 . Relationship between Xi and X4 simple log line 

Such relationships would be expressed by the equation: 

Log Xi = Log a + f>i2Log X2 +■ 6'i2Log X2^ + ^isLog X3 

+ 6'i3Log Xs^ + 6"i3Log Xs^ + bi4Log X 4 

Usually it is a waste of time to compute such expensive mathe- 
matical lines in practical problems unless the statistician knows 
before the work is done that such lines fit the data well. Such 
lines are definitely beyond the limits of an elementary text. The 
references at the end of this chapter are adequate to guide students 
who may wish to study such regression lines. 

FREE-HAND MULTIPLE CORRELATION 

Quite dependable free-hand methods for estimating multiple 
and practical correlation have been developed during the past 
fifteen years. Although they are much briefer and less expensive 
than mathematical curvilinear methods they are too complex and 
advanced to be included at length in an elementary text.^ 

Multiple and Partial Correlation combined, form one of the 
most powerful and useful methods of advanced statistical re- 
search available. If the elementary student does not master 
them, he would do well to become acquainted with their basic 

^ Ezekiel, M., Methods of Correlation Analysis, Chapters 14, 15, 16, and 17, 
gives an adequate presentation of these methods. Also, Pearson, F. A., and 
Bennett, K. R., in Statistical Methods Applied to Agricultural Economics^ 
Chapter 13, gives a sufficient presentation of the methods. 



606 MULTIPLE AND PARTIAL CORRELATION 


principles and more simple methods as a means of appreciating 
and interpreting in a more intelligent way much of the material 
with which he will be confronted in many textbooks and cor- 
poration and governmental reports. Such a knowledge is useful 
from the standpoint of consuming as well as of producing sta- 
tistics. 


LIMITATIONS AND USES OF MULTIPLE AND 
PARTIAL CORRELATION 

Straight-line multiple correlation assumes that (1) the rela- 
tionships among the variables are linear, and (2) are additive. 
Often neither assumption is entirely corrrect. As has been in- 
dicated earlier many, perhaps most, relationships in the social 
and biological as well as in the fields of the physical sciences are 
not straight line. In some cases the deviations from linear equa- 
tions are not sufficient to invalidate the results; in others they 
are. The researcher should be well enough informed in his field 
of study to know whether linear relations are sufficient, or whether 
he must use curvilinear equations. Multiple curvilinear equa- 
tions are long and laborious to use and will often be avoided for 
this reason unless they are necessary. The methods of com- 
puting them are not given in this text but may be found in several 
more advanced texts. ^ 

The second assumption, that the relationships are additive, 
often does not hold. The assumption of additive relationships 
means that each variable increases or decreases without being 
affected by the other variable, just as the total weight of a loaded 
truck is equal to the weight of the truck plus the weight of the 
load. Combining such factors does not increase their total. In 
the fields of the social and biological sciences this assumption 
often does not hold. In many cases when two or more independ- 
ent factors are brought together they react on each other. Fer- 
tilizer and rainfall are an example. Since a given amount of 

^ Pearson, F. A., and Bennett, K. R., Statistical Methods Applied to Agri- 
culture, John Wiley <fe Sons, Inc., New York, 1942; Ezekiel, M., Multiple 
Correlation Analysis, John Wiley & Sons, Inc., New York, 1941; and Mills, 
F, C., Statistical Methods^ Revised, Henry Holt and Co., New York, 1938. 
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moisture is required to dissolve the fertilizer and bring its elements 
to the roots of the plant, a small amount of rain may make little 
or none of the fertilizer available, but a little more rain may make 
it all available. The two factors support and multiply each 
other. Fire and gunpowder are perhaps a violent illustration of 
joint action. It is a problem of which the statistician must al- 
ways take cognizance. 

Multiple correlation can be effectively used on samples of 
smaller size than is possible with tabular analysis; but as the 
number of variables included in the problem increases, the size 
of the sample must increase in order to hold the standard error 
to a safe minimum. For four variables a sample of at least 50 
items is usually necessary for dependable results and a still 
larger sample would be preferable. Much of the unfavorable 
criticism to which the results of multiple correlation have been 
subjected is due to the use of samples which were too small and 
to the additional fact that the studies were not carefully and 
logically planned. The more powerful a research tool is the 
greater must be the care with which it is used. Multiple and 
partial correlation, if carefully and logically applied to samples 
of adequate quality and size in problems which have been properly 
planned, is one of the most powerful and adequate engines of 
analysis available to the statistician. It is not a device for 
novices. To obtain good results the researcher must be thor- 
oughly informed in the entire field of science in which his prob- 
lem lies. 

The standard error of the coefficient of multiple correlation is 
computed from 

Foemula No. 125 



For partial correlation coefficients the formula is similar. 
Foemula No. 126 
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In both formulas the symbol M equals the number of inde- 
pendent variables included in the problem. 

Multiple and partial correlation have found wide use in the 
various fields of agricultural science. They are also widely used 
in the testing of variables to be included in indexes. Sociology, 
psychology, economics, education, and market analysis and price 
forecasting are among the larger fields in which they have proved 
useful. 


SUMMARY 

1. The coefficient of multiple correlation expresses the total combined 
relationship between a group of independent variables and one dependent 
factor. 

2. Partial correlation expresses the degree of net relationship between 
a dependent factor and each one of a group of independent variables 
after the influence of the other independent factors has been taken into 
consideration and computed out or removed. 

3. Since most efiects are the result of a number of factors, multiple ' 
and partial correlation analysis is essential to obtain an accurate measure 
of the influence of each separate factor in the total complex result. 

4. The symbol of straight-line multiple correlation is R, and of cur- 
vilinear multiple correlation is capital Rho, P. 

5. To prevent errors in the computation of the long net regression 
and correlation equations a check sum device is important. 

6. The Doolittle method of solving simultaneous equations is es- 
pecially helpful in the solution of such long multiple equations. 

7. The beta coefficients, based on the standard deviations of the vari- 
ables and the net regression coefficients are an alternative device for com- 
puting the net effect of each independent factor. 

8. The beta coefficients will not be identical with the partial coeffi- 
cients, but will rank in the same order in size. 

9. The several coefficients of separate determination will equal the 
coefficient of total determination but are not dependable when the inter- 
correlation between two or more of the independent variables is high. 

10. Multiple and partial determination or correlation are highly effec- 
tive and powerful devices for statistical analysis, but like any other 
complicated mechanism require a maximum of exact knowledge and skill 
in their effective use. Any problem to which they are applied should 
be carefully planned and should rest on sufficient dependable data. 

11. As the number of variables included in a multiple correlation 
problem increases, the number of items in the sample must increase if 
dependable results are to be obtained. 
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REVIEW QUESTIONS 

1. What is the difference between simple and multiple correlation? 

2. Why should multiple correlation be studied by beginners in sta- 
tistics? 

3. What is meant by “weighting’^ the independent variables? 

4. Why must the variables be stated in ‘^standard measures or in 
terms of their standard deviations in multiple correlation? 

5. What are the advantages of the Doolittle Method of solving equa- 
tions? 

6. What is separate determination? What are its advantages and its 
weaknesses? 

7. What is partial correlation? Exactly what does it measure? Ex- 
plain. 

8. What is the ^‘beta^^ coefficient? What does it measure? Explain 
in detail. 




CHAPTER 25 


PREPARATION OF STATISTICAL 
REPORTS 


A statistical study is usually undertaken to answer some 
specific question. When the study is completed it is necessary to 
state the results in a form that is appropriate to the subject 
matter and to the persons who are to use it. It has often been 
taken for granted (1) either that anyone could prepare an adequate 
statistical report (2) or that this was a matter beyond the province 
of the statistician. Neither assumption is correct. It is difficult 
to prepare a good and adequate statistical write-up of a research 
problem as many students and statisticians who have attempted 
it have discovered. One large business firm offered a large salary 
to anyone who could train their personnel to prepare good re- 
ports. With the greatly increased use of statistics in the affairs 
of daily life the preparation of clear, concise, logical, and effective 
statistical reports is a necessary practical accomplishment for 
many persons. The student should be trained in this skill early 
in his college course. It will pay him large dividends in his re- 
search work and in his business career. Teachers should not 
neglect it. 

TYPES OF STATISTICAL REPORTS 

The type of report in which a statistical study terminates de- 
pends upon (1) the subject matter and (2) the persons for whom 
it is prepared. Three general types cover the field fairly well. 
They are (1) the scientific report, (2) the business report, and 
(3) the popular report. There are, of course, wide variations 
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within each of these classes and many degrees of combinations of 
two or more of them. 

Form of Scientific Report 

A strictly scientific report is designed primarily for scientists. 
It takes for granted on the part of the reader a suflfiicient knowl- 
edge of the technical terms employed. It assumes on the part of 
the reader an adequate understanding of the scientific methods and 
mathematical computations involved. The scientific report, 
therefore, is brief and strikes directly to the heart of the problem. 
It omits all unnecessary detailed and routine computations and 
states the results of the study in summary form. Only new or 
unusual techniques are explained in detail. 

Outline of Scientific Study, Although there is some variety 
in the form of scientific reports, in the main they follow rather 
closely the following outline: 

1. General title or statement of the problem 

2. Author or authors 

3. Institution or auspices under which the 
study is conducted 

4. Introduction 

5. Statement of methods 

6. Body of report 

7. Summary 

8. Bibliography 

9. Footnotes 

1. The title of a scientific study should be an exact and full 
statement of the question studied. It should be framed to reveal 
information and not as a catch phrase. The following are fair 
examples: 

“The Quantitative Estimation of Relative Concentrations 

of the Viruses of Ordinary and Yellow Tobacco Mosaics and 

Tomato Spotted Wilt by the Primary Lesion Method.^’ ^ 

^ By Rupert J. Best, from the Wait Agricultural Research Institute, Uni- 
versity of Adelaide, South Australia. 
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or 

“Land Ltilization and Classification in New York and Its 

Relation to Roads, Electricity and Reforestation/^ ^ 

The title should make clear at once the nature and scope of the 
study. 

2. The names of authors are usually stated without title or 
comment. 

3. The university j foundation , society, or other institution under 
whose authority, aid, or auspices the study is made should be 
concisely stated after the author’s name at the beginning of the 
study, as. is indicated in footnotes on this and the previous pages. 

4. The introduction is a brief statement of from one to three 
pages setting forth (1) the origin of the problem, (2) its develop- 
ment up to the present time and (3) the proposed advances or 
additions to be made in the present study. The length of the 
introduction will depend on the age and nature of the problem 
but should be no longer than is needed to orient the reader in the 
subject. 

5. The statement of methods should be as brief as possible to 
cover the essential attacks on the problem. These should include 
(1) methods of collecting data, (2) experimental design, (3) sam- 
pling, (4) coding, (5) statistical measures used, and (6) all ex- 
traordinary or unusual procedures or methods. It should make 
perfectly clear the nature and limitations of the data on which the 
study is based. 

6. The body of the report is the logical systematic statement 
and explanation of the several points or principles covered in the 
study and of the relationships which have been discovered to ob- 
tain among them. It will usually contain: 

a. Graphic presentation of data and the relationships which 
exist among them. 

b. Tables of ratios, coefficients, equations, indexes and errors 
which express in mathematical terms the relations among the 
data and variables of the study. 

1 By T. E. LaMont, of the New York State Land Survey, Bulletin 372. 
March, 1937, New York State College of Agriculture. 
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c. Explanation and discussion of the several relationships dis- 
covered as revealed in the graphs, coefficients, ratios and equations. 

The body of the report will be developed in some logical se- 
quence. Usually it will begin with the several details of specific 
data and move toward a summary conclusion through the in- 
ductive method. The strength, weakness, force, and limitations 
of each point and step should be fully stated and its bearing on 
the entire problem pointed out. Statistical methods are merely 
devices for making logical correct thinking more exact and de- 
pendable. 

7. The summary is a recapitulation in numerical order of the 
more significant points and conclusions of the study. It is a final 
brief statement of the results of the research. It should be clear, 
to the point, and free from aU overstatement. It is unethical for 
a statistician to claim in his conclusions more than his data and 
study substantiate. 

8. A complete bibliography or roll of sources used or referred to 
in the study should appear at the end of the report in alphabetical 
order. This part of the report is very important. Those who 
examine your work may wish to pursue the study further and to go 
back to the sources of the material. The bibliography at the end 
of this text is an example. 

9. Adequate footnotes should appear at the bottom of every page 
or at the end of tlie study giving full recognition to every source 
used. 

Perhaps the best practice for the student who wishes to pre- 
pare a scientific report is to examine a number of such reports in 
the document room of the library. All college libraries and most 
of the larger city libraries will be well supplied with such reports. 
These documents will vary slightly from one field to another. 
The student should study the best reports in the field in which he 
is interested. 

Business Reports 

Statistical reports in business may be roughly divided into 
three classes: (1) research problems in business, (2) special mana- 
gerial reports, and (3) routine reports. Although each of these 
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three classes is a distinct type, there is a wide variation within 
each class. 

Research Reports in Business. Such reports are the most 
pretentious and formal in this field of statistics. They are, in 
fact, closely related to the scientific reports discussed above and 
may be quite long and represent a large amount of careful study. 
They range all the way from monographs and small books to 
reports containing ten or a dozen pages. Among the most common 
fields touched by such studies are (1) location of plants or indus- 
tries, (2) sources of raw materials, (3) power, (4) labor problems, 
(5) working conditions, (6) population problems, (7) consumer 
purchasing power, (8) costs and standards of living, (9) trade 
territories, (10) market analysis, (11) transportation, (12) banking 
and credit problems, (13) foreign trade, and (14) international ex- 
change. These are, of course, only general fields and thousands 
of research problems are always in process on special delimited 
problems in these general areas. 

Outline of Business Report 

1. Title 

2. Letter of submittal 

3. Index of general contents 

4. Index of graphs and tables 

5. Body of report 

6. Summary 

7. Bibliography 

8. Appendixes 

1. The title of the report with the name of the author should 
appear on the cover page. The title should convey a clear idea 
of the general contents of the report. It is well for it to answer 
the three questions: What? Where? and When? as 

Report on the Industrial and Economic Situation in Chile, 
Nov., 1927, by W. F. Vaughn Scott 
or 

Relation of Central Market Prices of Strawberries to Pro- 
duction Planning, 1932, by Orville J. Hall, Fayetteville, 
Arkansas. 
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2. A letter of submittal is not necessary to all business reports 
but is proper in reports which have been authorized by some 
authority such as a government body or official, a board of di- 
rectors, corporation president, or department head. 

The following letter of submittal by the National Resources 
Board is an example: 

National Resources Board 
Interior Building 
Washington 
November 28, 1934 

The President 

The White House 
Washington, D.C. 

My Dear Mr. President: 

We have the honor to transmit herewith the report of the National 
Resources Board with the supporting documents used in its preparation. 

This report carries this significance: That it is the first attempt in 
our national history to make an inventory of our national assets and of 
the problems related thereto. Moreover, for the first time it draws to- 
gether the foresight of various planning agencies of the Federal Govern- 
ment and suggests a method for future cooperation. 

The members of the Board have not all had an opportunity to give 
full consideration to all of the points involved. However, they unani- 
mously agree in principle and desire to indicate to you their belief in 
the great importance of this study and of initiating steps toward the 
accomplishment of the broad program herein outlined. 

Very respectfully yours, 

(s) Harold L. I ekes 

Harold L. Ickes 
Secretary of the Interior, 
Chairman 

This is the submittal letter of a pretentious 455 page study 
which involved the entire United States and the labor of many 
persons. The submittal letter of many reports might be much 
shorter and less formal. 

3. The index of the general contents of a business report should 



616 PREPARATION OF STATISTICAL REPORTS 


be quite complete. If the report is worthwhile it may be used a 
great deal. Ready reference to its contents is an economy of time. 

4. The index to the graphs and tables should be complete. Since 
much of its value will consist of its tables and illustrative graphs 
as summary material, one should be able to locate any desired 
material with the least delay. 

5. The body of the report should be organized in a logical, co- 
herent, progressive form. Beginning with the original data and 
basic principle each step should lead logically and naturally to 
the next up to the final summary. 

Such outlines may fall in the form of (1) demand, (2) supply, 
(3) price, and (4) costs; or (1) fixed capital costs, (2) other fixed 
costs, taxes, etc., (3) direct material costs, (4) direct labor costs, 
(5) service costs, and (6) transportation costs; or (1) population 
by classes, (2) income by classes, (3) sales outlets; or (1) labor 
turnover, (2) working conditions, (3) wage rates, (4) union ac- 
tivities, and (5) management and labor cooperation; or any 
other set of logical and coherent relationships which the problem 
and data suggest or determine. 

The body of the report is the report and it must stand or fall 
upon the soundness of its logical relations and statistical analysis. 
All this should be taken into careful consideration in the pre- 
liminary planning and organization of the project before the data 
are collected. A good business report cannot be made from a 
conglomerate of irrelevant data and conflicting logical relation- 
ships. The objective of the study must be clearly in view from 
the beginning, 

6. The summary of a business report is as necessary as that of 
any scientific study. It should state in brief numerical order the 
points and principles established or suggested by the study. It 
should be unbiased. It should point out the weak as well as the 
strong points revealed in the research and should lay the foun- 
dation for a wise and dependable judgment. 

7. The bibliography should be complete and given in alpha- 
betical form by either authors or subjects or both for purposes of 
easy reference. 

8. The appendixes should include the original data, the prin- 
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cipal worksheets and computations, and other material on which 
the tables, graphs, coefficients and percentages in the body of the 
report are based. In this respect a business report should be 
somewhat different from the report prepared for scientists. The 
report prepared for scientists usually omits most or all of those 
detailed materials which it is proper and necessary to include in 
the appendix of the business report. Since the business report is 
frequently used as a basis for making far-reaching managerial 
decisions, it should make available the full evidence on which such 
decisions can be n\ade with the best measure of success. 

Special Managerial Reports. Frequently a board of di- 
rectors, a business executive, a staff member, or committee will 
require a relatively small amount of statistical information and 
analysis for some immediate managerial problem. In such cases 
some statistician or statistical clerk, or perhaps some accountant, 
dDOokkeeper, filing clerk, or other capable employee will be re- 
quested to get the data and make the computations. 

The usual characteristics of such reports are that they are brief 
and specific. They are sometimes, however, difficult to make. 
The very fact that they must be so exact and specific makes it 
necessary to discover the exact and detailed information, which is 
often not easy to do. The fact that such requests are usually 
made on short notice to meet some immediate need makes the 
problem doubly difficult. The board of directors may require 
some information for their meeting tomorrow or even for today. 
The statistical department of a business should anticipate such 
needs as far as possible and have their materials organized in such 
form that such necessary requests can be met with a minimum of 
delay and effort. 

Such special reports are usually quite informal and simply 
drawn. They depend on secondary data which may be recombined 
or reanalyzed for some immediate specific purpose. A firm with a 
modern mechanized statistical department may only need to re- 
sort and tabulate a portion of their punched cards to supply the 
information. Sometimes, however, outside and additional data 
must be obtained and analyzed. Such hurried reports are neces- 
sarily as brief and informal as possible. This requirement does 
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not eliminate the necessity for source references, bibliography, 
indexes of subject matter and graphs and tables. The essential 
parts of the report are present but are usually briefer and less 
formal. 


ROUTINE BUSINESS REPORTS 

Such reports consist of filling in prepared blanks. They may 
be made daily, weekly, or occasionally. The information may be 
that obtained by a traveling salesman during his day^s visits on 
the firm^s customers as to orders, stocks, business conditions, 
customer demands, credits, and such matters. The routine re- 
port may be that of the branch manager to the central office, or of 
a department head to the manager. 

When the printed form for such reports is made, the nature and 
limits of the reports are definitely fixed. The making of the 
original forms is, therefore, the principal concern of the statis- 
tician in such cases. The forms for all routine reports should be 
most carefully designed by the business executives who require 
the information with the aid of the firm^s statistician. The basic 
principle to follow is to reduce such forms and reports to the 
minimum required for adequate management decisions and to 
make one report supply as many needs as can be done efficiently. 
Since the requirements of such reports range through an infinite 
variety, they must be evolved by each firm through tests and ex- 
perience and revisions. 

Popular Statistical Reports 

There has been for many years a continual increase in the use of 
popular statistical reports. The reason for this is the general de- 
sire to economize time and space. Statistical tables, graphs, 
ratios, and coefficients are the short-hand of presenting ideas. 
Editors, writers, teachers, advertisers, lecturers, and others who 
appeal to the popular audience have discovered that they can get 
over to readers a larger amount of information more effectively and 
more quickly by statistical devices than in any other way. It is 
probable that this trend will increase. 

It is much more difficult to advise students how to prepare a 
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popular statistical report than to make a scientific or business 
report, because the former has a much wider range and variety. 
A popular report is less subject to exact and formal methods. It 
partakes of the nature of art and journalism. It is a device for 
presenting simple general ideas instead of making nice and exact 
distinctions. For this reason some statisticians abhor it and 
attempt to discourage it. Such efforts are fruitless and a wiser 
plan is to educate and direct statisticians to more sane and ethical 
standards. Besides, much of the popular presentation of ideas 
by statistical devices is already highly constructive and beneficial. 

GENERAL RULES FOR PREPARING POPULAR STATISTICAL 

REPORTS 

1. Such reports range all the way from (1) literary articles with 
a flavor of statistics to (2) a statistical report in a literary dress. 
The first type is frequently found in newspaper articles or even in 
editorials dealing with social, economic, or scientific materials. 
The second type is found in such popular business magazines as 
Business Week, The Nation^s Business, Bradstreefs Review, The 
Santa F6 Magazine. 

2. ' The amount of statistical analysis and the amount of literary 
dress combined in any given case should depend on the (1) purpose 
of the article and (2) the type of reader to whom it is directed. 
If it is an economic subject and the readers have been accus- 
tomed to thinking in terms of prices, production, and profits, more 
statistics may be included than would be suitable for political, 
social, or historical subjects for readers who think in terms of 
general ideas. But even general ideas can be enlivened and 
colored by well chosen data and analysis. 

3. Scientijic and business reports should always be written in 
terms of the third person only, but sometimes popular statistical 
reports are more effective in terms of the editorial, we, or even 
occasionally the first person singular, especially if the author is 
well known and is recognized as an authority. It is, however, 
usually more effective for most popular reports to use the third 
person as the general rule, with an occasional lapse to the first 
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person for special appeal or emphasis. Popular reports cannot 
entirely escape the functions of the editor or orator. It is a popular 
appeal based on data. 

4. Popular statistical reports of good quality are more difficult 
to prepare than rigid and exact scientific and business reports and 
should not be attempted by beginners until they have made a 
special study of such reports prepared by specialists in that field 
and have acquired some literary training and effectiveness in ex- 
pression. 

The most effective training for the preparation of any type of 
statistical report is the careful examination and detailed study of 
a number of the best existing reports in that particular field. The 
list of reports at the end of this chapter is designed to suggest to 
the teacher and student specific examples for such study. 

It is suggested that at least one semester of a student^s work in 
statistics should close with a term problem in which the student 
makes a statistical analysis of a degree of difficulty suitable to 
his ability and writes it up in a formal adequate report. Such 
training under the conscientious direction of the teacher is a most 
valuable part of a student^s training in statistics. 

SELECTED REFERENCES 

SCIENTIFIC REPOKTS 

1. Agricultural Experiment Station Bulletins ^ 

1) California Agricultural Experiment Station, Technical Bulletins 

2) Illinois Agricultural Experiment Station, Technical Bulletins 

3) Iowa Agricultural Experiment Station, Technical Bulletins 

4) New York Agricultural Experiment Station, Technical Bulletins 

2. Smithsonian Miscellaneous Collections^ published by the Smithsonian 
Institution, Washington, D.C. (These are usually excellent modes of 
scientific reports and may be found in the document rooms of college 
libraries and large city libraries. They range from Volume 1, published 
in 1862, to Volume 102, published September 1, 1942.) 

1 The technical bulletins of all the State College Agricultural Experiment 
Stations are similar and any of them will serve approximately as well as 
those of the four large states named above. Some of them can be found 
in any college library or large city library. Most of them are fair examples 
of a scientific report. 
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BUSINESS REPORTS 

1. American Economic Review, published by The American Economic 
Association, Washington, D.C. 

2. Canadian Journal of Economics and Political Science, The University 
of Toronto Press, Toronto, Canada. 

3. Harvard Business Review, published by McGraw-Hill Book Co., 
New York. 

4. Journal of Accountancy, published by American Institute Publish- 
ing Co., New York. 

5. Journal of Marketing, published by American Marketing Associa- 
tion, Menasha, Wisconsin. 

6. Review of Economic Statistics, published by Harvard University 
Press, Cambridge, Mass. 


POPULAR REPORTS 

1. Advertising and Selling, published by Robbins Publishing Co., Inc., 
Philadelphia, Pa. 

2. Dunnes Review, published by Dunn k Bradstreet, Inc., New York. 

3. The Santa Fe Magazine, published by Railway Exchange, Chicago. 




APPENDIX I 


GLOSSARY OF SYMBOLS AND 
INDEX OF IMPORTANT 
FORMULAS USED- IN 
THIS TEXT 


Latin Symbols 

a, h, Cj d = constants. 

byx = simple linear regression coefficient. 

34 . . . n = net regression coefficient. 
di 2 M . . . n = coefficient of separate determination. 
d = a deviation. 

D = decile. 

e = natural logarithm base, 2.71828. 

/ = frequency of a class interval. 

G = geometric mean. 

H = harmonic mean. 
i = class interval, as 4-7.9 or 4 to 8. 

I - index. 
k = a constant. 

L = lower limit of a class interval, as in 4-7.9, 8-11.9, 4 and 8 
are the lower limits of the classes. 
m = midpoint of class intervals, as in 4-7.9, 8-11.9, 6 and 10 are 
the midpoints of these two classes. 

Me = median. 

Mo = mode. 

n — degrees of freedom. 

N = total number of items in a sample, total of frequencies in a 
frequency distribution. 
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p = percentage. 

Po = price of base period. 

Pi = price of given period. 

Pi — price of commodity at time 1. 
pi" = price of commodity at time 2. 
p{^ = price of commodity at time n. 

Vi 

— = price relative. 

Po 

q = quantity weight. 

Q = quartile. 

r = coefficient of linear correlation. 

= coefficient of linear determination. 
ri2 34 . . n = coefficient of partial correlation. 

Ri2m . . n = coefficient of linear multiple correlation. 

Rh 234 ... n = coefficient of linear multiple determination. 

Si = standard deviation of small sample. 

Sk = skewness of a frequency distribution. 

Sy = standard error of estimate for y. 

t = deviation of a given statistical measure from a hypothetical 
value divided by its standard error. 

T = deviation of a given statistic from the mean of a normal 
distribution divided by its standard deviation. 
u = a variable. 

V = a variable. 

V = coefficient of variation. 

W = weight. 

w = a variable. 

X = a variable, an item of data. 

X = arithmetic mean 

X = deviation from true mean. 

x' = deviation from an assumed mean or origin. 

Xi = dependent variable. 

X2, X3, X4, Xn = independent variables. 

Y = a variable, an item of data. 

Y = arithmetic mean. 

y = deviation from mean. 

y' = deviation from assumed mean or origin. 
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z = ratio of r to z, Appendix Table III. 

z = a residual, deviation between an observed item and its es- 
timated value on the regression line. 


Greek Symbols 

Pi = beta coefficient. 

TT == pi, ratio of circumference of circle to its diameter, 3.1416. 

Pi == rho, coefficient of simple curvilinear correlation, 
pi^ = rho, coefficient of simple curvilinear determination. 

Pi 234 . . == rhoi, coefficient of multiple curvilinear correlation. 

234 ... n = rhoi, coefficient of multiple curvilinear determination. 
O’ = sigma, standard deviation. 

<j 2 = variance. 

Ox = standard error of arithmetic mean. 

Off = standard error of standard deviation. 

Or = standard error of coefficient of linear correlation. 

Op = standard error of index of correlation. 

Oxx - = standard error of the difference of two means. 

Op^ _ p. = standard error of the difference of two percentages. 

S, sigma, = sum of, or summation. 

= Chi-square. 


Important Formulas 

1. Formula for locating midpoint of class interval 

j — Li 

m = Li H 2 — 

2. Formulas for arithmetic mean 


_ ■vv' _ yy 

(1) Z== F for individual items. 

'V -f 

(2) X = for class intervals, long method. 

(3) Z = A + i, for class intervals, short method. 

(4) Xw == for weighted mean. 

ZW 
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3. Formulas for geometric mean 

(1) G = \/Xi • X2 • X3 • - • Xn, for individual items. 

(2) LogG* ■ ■ S(Log Xi + Log X 2 + Log X3 + • • • + Log Xn) 

or Log G = ^ ) for individual items. 

(3) Log Gw ' ^(^1 Log Xi + TF2 Log Z2 + • ■ • TF„ Log Z.) 

SIF 

or Log Gw = ) weighted geometric mean. 


(4) Log G = ^hen (class intervals are used). 


4. Formulas for harmonic mean 


2(-i + i- + 

, . 1 Va;i Xi 

H~ N 




for individual items. 


(2)^ = 


s(— +/— d ^f—') 

\mi mj ninj 


(3) 


N 

S(Tfi- + TF2-+. 

Xi X 2 


(when class intervals are 
used) . 

+ TF„-^ 


Hw 


ZW 


' Xn weighted harmonic 
mean. 


5. Formula for quadratic mean 

^ /W(XiY + (X^)^ + (X,r + ■ V ■ + 


6. Formulas for median, quartiles, deciles, etc. 


/1\ Tl/T- T \ V ‘ 
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7. Formulas for mode 

(1) Mo = L + 7 4^-7 ij interpolation formula. 

Ji i- /2 

(2) Mo = X — 3(X — Me), locational formula. 

8. Formula for range 

(Z. » Zx) + 1 

9. Formula for quartile deviation 
QD = 


10. Formulas for mean deviation 

(1) Md = ^(^1 ^ + (Z2 - Me) + ■ - + (Z. - Me) ^ 

for individual items. 


(2) Md 


Xf(m — Me) 


j for class intervals. 


11. Formulas for standard deviation 


(2) cr-il 




for class intervals. 


12. Formulas for skewness 
/-. N q Q3 + Qi — 2Me 

— c.-a 

„ ZiX-Me) 


( 2 ) & = 

13. Formula for coefl&cient of variation 

F = ^ 100 
X 


14. Formulas for linear regression (least squares) 

(1) F = a + byxX, or X = a + hxyY 
, . fXY = na + hXX (XX = na + h2Y 

{'SXY = aXX + bZX^ l2XF = aSF + 62F^ 
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(3) a=Y-hX 

(4) hm = in terms of the original data. 

^ ^ 2X2 - iVZ2 

(5) byx = in terms of deviations from the means. 

ZiX^ 


15. Formulas for curvilinear regression 

(1) Simple parabola 
F = a + bXA- cX2 

(2) When the original data are used 

[ 2F = ?ia + 62Z + c2X2 
2ZF = a2Z + 62Z2 + c2Z» 

[ 2Z2F = a2Z2 + 62Z* + c2Z< 
na + 62Z + cEZ^ = 2F 
o2Z + 62Z2 + c2Z*> = 2ZF 
[ a2Z2 + &2Z^ + c2Z^ = ZZ^F 

(3) When the deviations from the means are used 

a Y — bX - cU (and U is substituted for X^) 
'Eixyb + 'E(xu)c = Xxy 
X{xu)b + 2('u)^c = Xuy 


(4) Cubic parabola 


(i) Y ^ a+bX-^cX^ + dX^ 


(ii) 


na + 52X + -f dXX^ = SF 

a2X + 62X2 + c2X^ + dXX^ = 2XF 
a2X2 + 62X3 + c2X" + dXX^ = 2X27 
a2X3 + 62X" + c2X^ + dZX^ = 2X^7 


when the 
original 
data are 
used 


(iii) 


(2^2)6 + (Xxu)c 
(Lxu)b + (Eu^)c 
(Xxv)b + (^uv)c 


+ (Xxv)d = 2x2/ 
+ (Xuv)d = Hiuy 
+ (Xv^)d = Xvy 


when the de- 
viations from 
the mean are 
used and U 
is substituted 
for X2 and 
V for X3 
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(5) Logarithmic regression lines 

UG “h UEiX 


(i) log F = a + bX, 


(ii) F = a+&logX, 


S(log F) 
aSX + 5SX2-2(Xlog F) 
na+b2 logX-SF 
a2 log X + 62 (log X)^ 

= 2 (Flog X) 


(iii) Log F == a + 6 Log X 

J na + 62 log X = 2 log F 
1 aX log X + 62(log Xy = 2(log X • log F) 

(6) (i) Exponential curve (1) F = ah^, or 
log y ~ log a + (log b)x 
{ 2 (log y)^n\ogaA log bX{x) 

\ 2(a; • log y) = log aXix) + log * 62 (x^) 

(ii) Compound interest curve, or a special case of ex- 
ponential curves 
2/ = P(1 + r)^, or 
log 2/ = log P + log (1 + t)x 


(7) Logistic 





(1) - = a + 6c® 

y 


^ di{c - 1) 

(C« - 1)2 



16. Formulas for standard error of estimate 

( 2 ) = \m 

(3) 
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») s, - 


N 

(5) ;S/ = - Py 

(6) Sy = CTj; V 1 —P 

17. Formulas for computing coefficient of linear determination 
and correlation 


(1) P = ^ 


( 2 ) P 


1 - 


sy. 


(3) = 


(4) r = 


<Jy' 


(5) r 


= \/^ 
V ay 


(6) r = 

(7) r = 

(8) r = 




'Lxy 


Na^ay 


N'EXY- (SX-SF) 


V[iVSZ2 - (SX)2][iVSF2 - (SF)2] 

(9) P = byx ■ bxy 

(10) r = Vbyx -bxy 

18. Formulas for the normal curve 

for any ordinate in the normal 
curve 

for the maximum ordinate at zero (0) 
deviation from the mean 


(1) Vo 

(2) yc = 


m ^ 

;= e 2<r2 

cj"v27r 

Ni 


2.50660- 

19. Formulas for standard errors 

(Tx 


(1) The mean (Tx = 


Vn 
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(2) The standard deviation cr. 


Y'2N 


(3) The coefficient of correlation cr^ 


ViV 


(4) The standard error of the difference between two means 

. I when computed from the two 

\ N N standard deviations 

when computed from the standard 
error of the two means 


<^D 


(5) = V + cr^ 


20. The i“test formulas 

(1) Difference between two means 

X1-X2 


T = 


t ■ 


<^D 


Jjb: 

V 


N 2 
+ N 2 


(2) Difference between two regression coefficients 
bi - 62 


t = 




and cTfr^. 


i+h 


■Sf 2 02 


(3) Test of significance of coefficients of correlation 

t ■■ 


VW^ 


Vl — 

(4) Difference between two percentages 
Vi - P2 


t = 


and (T pj-j-pj 


V iVi 


+ 


P2g2 


"Pi-1'2 V iVi X 2 

in which p = favorable occurrences and q = unfavor- 
able occurrences 


21. Equations for Chi-square 
(Xx 


( 1 ) = 

(2) : 


~ my (X 2 — my 
m m 


f (fi 


1 /2 


— I for a series of frequencies 


22. Formulas for secular trend in time series 

(1) y = a + 6pxX, for linear trends, short cut 
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(2) « = ^ 

(3) 6. = ^: 

(4) Y = a + hX + cX^, for simple parabola, short cut 

(5) b 2X2 

(6) JVa + c2X2 = Sy , , , 

'ZX^a + c2Z^ - XX^Y J solution of a and c-values 

(7) Log F = a + for simple log line 

(8) Y = a + bX + cX^ + dX^ for cubic parabola, short cut 

(9) 2F = Ara + c2X2 _ . _ _ ^ 

XX^Y = aZX^ + cZX^ 1 solution of a and c-values 

2ZF = 62X^ + dSXM , 1,. .7. I 

YjX^Y 52 X'^ -f- (f2X® / solution oi 5 and < 2 - values 

(10) log F = a + 6Z 

SlogF 

(11) a = —f— 

SZ log F 
SZ2 


for solution of 5 and d- values 


(11) a = 


23. Residual method of computing cycles 


(1) (CxJ) = 

(2) {C X I) = 

(3) (C X 7) = 

(4) (C + 7) = 


TxCxSxI 

TXC 

SXCXI 
^,or 

TXCXI 


T(S + C + I) 
T 


C = Cycle 

I = Irregular change 
S = Seasonal variation 
T = Secular trend 


24. Formulas for index numbers 

(1) Unweighted aggregative index of prices 

j _ 2(p/ 4- Vi' + Vi" H 1- Vi^) t^2(pi) 

2 (po' + Po" + Po'" + • • ■ + Po") S (Po) 
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(2) Weighted aggregative index of prices 

T ^ ^[(Pigi)' + (pi9i)"+ (pigi)"'H h (Pigi)"] 

2[(po?o)' + ipoqoY' + {VoqoY" H h (Mo)"] 

2:(pigi) 

2(Mo) 

(3) Unweighted average of price relatives index 


(m\ + 

\Vo/ \Po/ \Vo/ 


+ • • * + I 


(4) Weighted average of price relatives index in which the 
weights (poQo), or price x quantity, or value are used 

j ^ KI7) + (<) + (1^)^°"^°".. 

„ "[(gH 

2(Pogo) 

(5) Weighted geometric average of price relatives index using 
logarithms 

J = S Po'qo log log + ^ 


J = S 


! + •*• + 


+ !>.*®-log(g)] 


(6) Quantity indexes weighted by prices 

j __ 2[(gi'poO + (qi'po") + {qi''pY") ■] + (gi^Po”)] 

2C(go'po') + W'po") + (qo"'pY") H h ((|'o’‘Po")] 

o„ 2(giPo) 

2(?oPo) 


25. Formulas for analysis of variance 

(1) = -- -- - , variance is square of cr^ 
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( 2 ) 

(3) 

(4) 


o (SX)^ total squared deviations from mean 


= correction value for total of all classes 


(SZ)2 

N 

SC(2Zi)^+ + ' 


/ 


+ (EXnYi 

N 


total sum of squares between class means 


APPENDIX II 

APPENDIX OF TECHNICAL 
TERMS 


A 

ABSCISSA = the distance measured from F-axis to a point along a 
parallel line to the a;-axis. If the point is to the right of the F-axis 
on the graph, the abscissa is positive. If the point is to the left 
of the F-axis, the abscissa is negative. 

ADBITIVE RELATIONSHIP Or ADDITIVE FUNCTION = the joining of 
two or more constants or values so that the result is their sum. 

AGGREGATE = the sum Or total of a group of values or numbers. 

AGGREGATIVE INDEX NUMBER = an index number computed by 
dividing the sum of values for a given period or area by the sum 
of values for a base period or area for the same series. The given 
total is divided by the base total and the quotient expressed as a 
percentage. 

ALGEBRAIC SUM = the sum of a group of numbers with signs + 
and/or — taken so that the sum is the difference between the sum 
of the + numbers and the sum of the - numbers. 

AMPLITUDE = the maximum distance from the lowest to the high- 
est points in a curve. 

ANALYSIS OF COVARIANCE = a Combination of the analysis of 
variance with the analysis of regression and correlation in which 
the significance of the variation between classes is computed after 
the influence of one or more other variables has been removed. 
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ANALYSIS OF YARiANCE = a method of computing the degree of 
significance in the variations of a particular characteristic among 
comparable groups or classes of data. 

ARBITRARY ORIGIN = any value which is used as a zero point from 
which to compute deviations on a scale such as the guessed average 
or assumed mean used in computing the arithmetic mean by the 
short method. 

ARITHMETIC MEAN or AVERAGE = the sum of the items, numbers, cr 

SX 

magnitudes in a series divided by their number, X = • 

ARITHMETIC SCALE = a scale OR which equal absolute numbers are 
represented by equal spaces on the scale regardless of where they 
appear on the scale. 

ARRAY = an orderly arrangement of numbers or values according 
to magnitude usually from the smallest to the largest. 

AVERAGE = any measure of central tendency. Frequently used 
to indicate the arithmetic mean. 

AVERAGE DEVIATION or MEAN DEVIATION = the arithmetic mean 
of the absolute deviations of the items of a series of data from a 
measure of central tendency, usually the median, but sometimes 
the arithmetic mean is used. It is a minimum from a median and 
equals . 7979 cr. 

AXIS, X-axis, F-axis = the horizontal and the vertical scales of a 
coordinate graph or lines. 


B 

byx = symbol for total regression coefficient of Y and X when 
both are expressed as original data or deviations from any origin 
but not in standard deviations. Also expressed as hn for Xi and X2. 
&12.34 . . . w = symbol of partial or net regression and expressing the 
relationship between Xi and X2 after including X3 • X4 • • • Xn in 
the computations and removing their influence on Xi when the 
data are not expressed in terms of their standard deviations. 
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BAND CHART = a graph composed of several irregular zones ex- 
tending across the scale from left to right indicating several series 
of varying magnitudes. 

BAR CHART = a graph composed of a series of bars of varying length 
each of which represents comparable magnitudes. The bars may 
be either vertical or horizontal and are all measured from the same 
base. 

BASE = a value, price, quantity, or other magnitude in space or 
time from which a series of items are measured. In graphs and 
cycles the base is usually zero (0). In seasofial variation and in 
index numbers it is usually 100. In ratios the base is the de- 
nominator of the fraction of which the ratio is the quotient. 

BETA REGRESSION COEFFICIENT = the value of a regression co- 
efficient when it expresses the relationship between two variables 

in terms of their standard deviations. /3i2,34 . . . w = & 12 . 34 . . .n^ 

O’! 

BIASED SAMPLE = a Sample not taken at random or one which does 
not represent the population from which it is drawn. 

BiMODAL = a frequency distribution which has two maximum fre- 
quencies of equal size is said to be bimodal; having two modes. 

BINOMIAL - an algebraic equation consisting of two terms, 
(a +6), (x - y), 

BINOMIAL EXPANSION = the expansion of a binomial to a given 
power, as (a + by = + 3a^b + Sab^ + ¥, 

BUSINESS CYCLE == the Variation of business activity which re- 
mains after secular trend and seasonal variation have been re- 
moved. The more or less regular movement of business activity 
through successive periods of prosperity and depression. It may 
be a cycle for a single series or an average of a group of series. 

C 

CAPTION = the heading of a column or vertical space in a table. 

CELL = the space enclosed in a table by the intersection of two 
parallel vertical and two parallel horizontal lines. 
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CENTERING = the process of shifting a moving average or trend 
from the beginning to the middle of a time period. 

CHECK SUM = the sum of a group of data which may be used to 
check the accuracy of computations in a worksheet. It is a grand 
total which checks against sub-totals or group totals. 

CHI-SQUARE, SQUARE CONTINGENCY = the sum of the quo- 

tients obtained by dividing the square of the difference between 
an actual and an assumed or theoretical frequency by the assumed 

frequency, 

CLASS INTERVAL, 2 , = the width of a class as 10-14.99, 15-19.99, or 
9.50-14.49, 14.50-19.49, or 10 to 15, 15 to 20. 

COEFFICIENT = a Statistical constant or ratio which is independent 
of the units in which the data are measured, or 612 , is the re- 
gression coefficient between X and F, or Xi and X2. r^x is the co- 
efficient of correlation between X and F. 

COEFFICIENT OF CORRELATION = a measure of the amount of 
variation in a dependent variable which is associated with varia- 
tion in one or more independent variables expressed as the square 
root of a percentage. Complete or perfect correlation is designated 
as 1.00. 

COEFFICIENT OF TOTAL DETERMINATION = a measure of the amount 
of variation in a dependent variable which is associated with one 
or more independent variables expressed as a percentage. 

COEFFICIENT OF PARTIAL CORRELATION = a measure of the net 
relationship of a dependent variable to one independent variable 
after the effect of other independent variables has been computed 
out or removed; ri2 34 ... n. 

COEFFICIENT OF SKEWNESS = a measure of the onesidedness or 
asymmetry of a frequency distribution. If the left side of the 
distribution is longer and thinner than the right, the distribution 
is said to be skewed to the left. If the right side is the longer, it 
is skewed to the right. In any distribution the mean falls near- 
est the skewed end and the mode falls farthest from the skewed 
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end, while the median always falls between the mean and the 
mode. 

COEFFICIENT OF VARIATION = the measure of the relative varia- 
bility of a frequency distribution based on the ratio of its stand- 
ard deviation to its mean expressed as a percentage, F = ~ 100. 

X 

COLUMN = a series of spaces or numbers in a vertical space in a 
table. 

CONSTANT ~ a quantity which retains the same value throughout 
a problem or series of computations; in F = a + bX, a and b 
are constants. 

CONTINUOUS DATA = values vrhich may be measured in infinites- 
imally small fractions of a unit of measure, as tons of coal, acres 
of land, or ounces of gold. 

CORRECTION = the quantity which must be added to or subtracted 
from one value to give another desired value. (1) The sum of 
the deviations from an assumed mean divided by their number. 
(2) N times the product of the means which must be subtracted 
from the sum of the products of original data to give the sum of 
the product of the deviations from the means. 

CORRELATION TABLE = a Coordinate table of two frequency dis- 
tributions designed to facilitate the computations of the sums of 
the squared deviations and products of the two variables required 
for the coefficient of linear correlation. 

CRITICAL RATIO = the difference between two comparable sta- 
tistics divided by the standard error of that difference, also called 

the ^-test; t = ~ — 

CUBIC PARABOLA = the curve fitted by the equation 

F = a4-6X + cX2 + dX^ 

in which a, &,rc, and d are constants based upon the average rela- 
tionship in the data. 
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CUMULATIVE FKEQUENCY DISTRIBUTION = a tabulation of fre- 
quencies formed by (a) adding to the frequency of each succeeding 
class the total of the frequencies of all preceding classes, or 
{b) adding to the frequency of each preceding class the total of all 
succeeding classes. A frequency distribution obtained by the 
(a) method is cumulated on the ^^less than’^ basis and those ob- 
tained by the (b) method are cumulated on the ^^more than^' 
basis. 

CURVILINEAR CORRELATION = any Correlation based on regression 
lines which are non-linear. 

CURVILINEAR REGRESSION == any regression between two variables 
which is expressed by a line that is other than straight. 

D 

DATA = the raw materials for statistical studies. , Any group of 
facts, figures, measurements, values, or numbers. 

DECILE = any one of the nine points which divide a frequency dis- 
tribution into ten equal parts. 

DEPENDENT VARIABLE = the Variable whose estimates depend on 
another variable or variables. A causal relationship is not essen- 
tial. All that is necessary is that the relationship between the 
variables is set up so that values for the dependent variable are 
estimated from the other variable on the basis of some function or 
relationship. 

DEVIATION = the difference between two quantities or values, or 
between two statistics or between a statistic and an item of data, 
as deviation from the mean; absolute error. 

DISCRETE DATA = values of a variable which may be taken at 
only certain limited points, usually in whole units, as men, wives, 
or soldiers. 

DOOLITTLE METHOD = a Systematic method for the rapid solution 
of simultaneous equations. Its details are explained in Chap- 
ter 23 of this book. If a statistician has many long equations to 
solve the Doolittle method should be mastered. 
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E 

EEROR = (1) the difference between an observed and an estimated 
value. (2) Differences between observed values and their mean. 
(3) Differences between a statistic of a sample and the parameter 
of the population from which the sample is taken. 

ERROR OF ESTIMATE = (1) the deviation of an observed item of a 
dependent variable from the regression line, (7 — Y') = z. (2) 
The difference between any estimated value and an observed value. 

EXPONENTIAL CURVE = a curve determined^ by the equation 
y = alf in which a and b are constants. 

F 

FACTOR == (1) a variable considered in a statistical problem. 
(2) a characteristic or quality common to two or more variables. 

FREE-HAND CURVE = a line drawn through a graph of plotted data 
to represent their relationship without the computation of a 
mathematical equation. 

FREQUENCY CURVE = any curve or graph which represents the 
size of the frequencies of a frequency distribution or the number of 
items in successive class intervals. 

FREQUENCY DISTRIBUTION = a table showing the number of items 
in a sample which fall in each one of a series of succeeding classes. 

FUNCTION = any variable whose value depends on another vari- 
able, 7 = a + bX. The value of 7 depends on the value of X or 
7 is a function of X. 


G 

GEOMETRIC MEAN == the nth root of the product of n factors. It 
may be computed by finding the antilogarithm of the mean 
logarithms of the sample items. 

GUESSED MEAN = estimated average, assumed mean, or guessed 
average, an arbitrary point from which an arithmetic mean may 
be computed by the short method. 
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H 

HAEMONic MEAN = the reciprocal of the arithmetic mean of the 
reciprocals of the sample items. 

HISTOGRAM = a graph which represents the magnitude of the fre- 
quencies of a frequency distribution by adjoining vertical bars. 

I 

INDEPENDENT VARIABLE = any Variable which may be given any 
desired value which will in turn determine the value of an asso- 
ciated dependent variable, F = a + bX. The assigning of any 
value to X determines the associated value of F. 

INDEX NUMBER = a computed value which in relation to a base 
measures the relative change of a quantity from time to time or 
area to area. An index is usually an average of some type ex- 
pressed as a percentage. 

INTERCEPT = the distance on either axis, X or F, from the point 
of origin to the point at which a regression line crosses that axis. 

INTERPOLATION = any process of estimating values between two 
or more known points. 

INTERQUARTILE RANGE == the distance on the range scale of a 
frequency distribution between the Qi and Qz. 

J 

j-SHAPED CURVE = a frequency distribution which is high at one 
end and low at the other instead of low at the ends and high in the 
middle. 

JOINT FUNCTION = the joining of two or more functions so that 
their gross result is other than their sum. Usually they are mul- 
tiplied or divided in their effect by being joined. 

L 

LAG = the length of time by which change in one variable or 
series follows change in another. 
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LEAD = the length of time by which change in one variable or 
series precedes change in another. 

LINEAR = a relationship which may be expressed by a straight line. 

LINEAR CORRELATION = Correlation assuming a straight-line re- 
lationship between or among the variables. 

LOGARITHM = an exponent indicating the power to which a base 
must be raised to equal a number. There are two systems of 
logarithms, (1) the natural logarithms based on e, or (2.7182828) 
which is not used much in elementary statistics, (2) the common 
logarithms based on 10 which are very useful in statistical com- 
putations and a table of which is included in Appendix Table V. 
A logarithm has two parts, the characteristic or whole number, 
and the mantissa or fractional part. Only the mantissas appear in 
tables of logarithms. 

LOGARITHMIC SCALE = a scale for charts on which equal spaces 
are given to equal percentages. 

LORENZE CURVE = a graph which measures cumulative values of 
two variables as portions of 100 percent for the total on the two 
axes. A straight line from the lower left-hand corner to the upper 
right-hand corner of the graph indicates an even or equal dis- 
tribution of the variables throughout their range. As the con- 
centration of either quantity becomes more marked in either the 
upper or lower limits of the 100 range, the line becomes more 
curved. 


M 

MEASURE OF CENTRAL TENDENCY = any One of the averages, mean, 
median, mode, etc. 

MEASURE OF DISPERSION = a measure of the scatter of the items 
around some measure of central tendency, as range, standard 
deviation, average deviation, etc. 

MEDIAN == that point on the range of frequency distribution which 
divides the observations into two equal parts. 

MEDIAN CLASS = the class in which the median falls. 
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METHOD OF LEAST SQUARES = that method of computing a sta- 
tistic which reduces the squared deviations from the point or line 
to a minimum. 

MID-POINT = the value at the middle of a class interval. 

MODE = that frequency or value of a distribution that is most 
numerous. 

MOVING AVERAGE = a seiies of averages obtained by averaging a 
given number of items in a series successively so that for each 
successive group the first item of the previous group is dropped 
and the next item below that group is added. 

MULTIPLE CORRELATION = the Correlation between one dependent 
variable and two or more independent variables weighted by the 
net regression coefficients between the dependent and each of the 
independent variables. 


N 

N = the number of items in a sample. 

NET REGRESSION COEFFICIENTS = the relation between a depend- 
ent variable and an independent variable after the effect of one 
or more other independent variables has been brought into the 
problem and its effect held constant, or'computed out of the result. 

NORMAL DISTRIBUTION = a distribution which falls in a bell-shape 
so that the two variations or slopes from the middle are uniform 
and in the shape of the normal probability curve. 

NORMAL PROBABILITY CURVE = the curve fitted or expressed by 

Ni iX. 

the normal probability equation, y = — == 6 2^2 

(tw2tv 

0 

ORDINATE = the distance from the X-axis to a point measured 
parallel to the F-axis. 

ORGANIZATION TABLE = a table designed and used to reveal and 
measure the relationship between two or more related variables 
expressed as class intervals. The prime tool of tabular analysis. 
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P 

PARABOLA = a curve determined by the equation, 

Y = a + bX + cX\ 
y = ax^ in which a, h, and c are constants. 

PARTIAL CORRELATION = Same as Coefficient of partial correlation. 

-TT, PI = the ratio of the circumference of a circle to its diameter, 
oq 

3.1416 or Y' 

• m 

piCTOGRAM == a statistical chart composed of pictures arranged to 
show magnitudes. 

PIE CHART or PIE DIAGRAMS = a graph in the form of a circle, usu- 
ally with its circumference divided into 100 equal parts to meas- 
ure percentages. 

POINT BINOMIAL = the term coefficients obtained by the expan- 
sion of the binomial expression (a + b)^ when (a + 5) = 1 and n 
is a positive finite integer, as (f + 

POPULATION = the entire field or group of items from which a 
sample is taken. Same as universe. 

PRICE RELATIVE == the price of any single commodity at any given 
time divided by the price of the same commodity for another time, 
called the base, expressed as a percentage. 

PROBABILITY = ^Hhe probability of an event is the relative fre- 
quency with which this event recurs in an indefinitely prolonged 
sequence or series of observations.^^ ^ The likelihood of the oc- 
currence of an event. 


Q 

QUADRATIC MEAN Or ROOT-SQUARE-MEAN = the Square root of the 
mean of the sum of the items squared. 

1 Mises, Richard von, Probability, Statistics, and Truths, The Macmillan 
Co.. N.Y., 1939. 
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QXJARTILE, Qi MD. Qz = 0116 of the three points on the range scale 
of a frequency distribution which divides the total of the fre- 
quencies into four equal parts. 

QUESTIONNAIRE = an Orderly arrangement of questions related 
to a statistical problem submitted to a number of persons for 
answers. Questionnaires are usually submitted by mail. 

QUINTILE = one of the four points on the range scale of a fre- 
quency distribution which divides the total of the frequencies 
into five equal parts. 

R 

i?i.234 . . . n = S3unbol of the coefficient of linear multiple correlation. 
i?^i.234 . . . n = the symbol of coefiicient of linear multiple deter- 
mination. 

RANDOM SAMPLING = the process of selecting a sample from a pop- 
ulation so that every item in the population has an equal and 
independent chance of being included in the sample. 

RANGE = the total distance between the smallest and the largest 
items in a sample plus one unit. 

RECIPROCAL == the reciprocal of a number is 1 divided by that 
number. The reciprocal of 5 is f or .2. 

REGRESSION COEFFICIENT = the average measure of the number of 
units of change in a dependent variable for each unit of change 
in the independent variable. 

REGRESSION LINE = the line which describes the relationship be- 
tween two variables based on the average relationship between 
them. 

RELATIVE = the quotient representing the relationship of the 
values of two times or two places expressed as a percentage. 

REPLICATION == the division of an experiment into subdivisions or 
cells to secure greater cross-section uniformity of data among 
variables or to reduce the effects of exterior .variables. 

RESIDUAL = the difference between an actual and an estimated 
value, as (F - F) - 
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ROUNDING- = expressing a number in a more general form by 
changing one or more of its digits at the extreme right to zeros 
and either raising or lowering the next digit to the left as the 
digits altered are less than or more than 5, as 473 may be rounded 
to 470 if only one place is involved or to 500 if two places are 
involved. 

ROW = the series of numbers extending crosswise or parallel to 
the X-axis in a table. 


S 

SAMPLE = a limited or finite number of items or data selected 
from a population or universe. 

SAMPLING RELIABILITY = the degree of accuracy with which a 
sample represents its population. 

SCATTER DIAGRAM or CHART == a graph of Coordinate scales on 
which the paired values of the data of X and Y are located by 
dots or marks opposite the appropriate points on the two axes. 

SCHEDULE = an orderly arrangement of questions related to a 
statistical problem which is to be filled out by an interviewer 
from information secured during an interview. 

SEASONAL INDEX = an index usually in the form of a percentage 
with the median of the year as a base of 100 which measures the 
changes in an activity from month to month. 

SEASONAL VARIATION = that change in a time series which is due 
to the season or time of year. 

SECULAR TREND = a line describing the long time expansion or 
contraction of values in a time series. 

SPATIAL SERIES = data based on location or space instead of time 
or duration. 

STANDARD DEVIATION = (T, the Square root of the mean of the 
squared deviations of the items of a sample from its arithmetic 
mean. In a normal distribution one <x plus and minus from the 
arithmetic mean includes 68.27% of the items in the sample if the 
sample is large. 
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STANDAED BKEOR OP ESTIMATE = the Square root of the mean of 
the squared deviations of the values of the dependent variable 

c- 

’ or - y ^ 


from the regression line. 


(F - Y'Y 


STANDAED SCORE = a deviation from the mean divided by the 

standard deviation of distribution, as — = F. 

ax 


STATISTIC = a value computed from a sample, such as a mean, 
mode, range, standard deviation, etc. 


T 

t == the ratio of a statistic to its standard error, 
r = a statistic in terms of its standard deviation. 

TABULATION = the Orderly arrangement of data ( 1 ) in tables of 
columns and rows, or (2) in frequency distributions, arrays, or 
other systematic forms. 

TALLY SHEET = a sheet or table with captions and stubs for class 
intervals and frequencies into which data are recorded by a tally 
mark for each item. 

TERTILE = either one of the two points on the range of class in- 
tervals which divide the frequency distribution into three equal 
parts. 

TIME SERIES = a Sequence of quantities corresponding to successive 
points of time such as seconds, minutes, hours, days, weeks, 
months, years, or decades. 

V 

VARIABLE = a value or measure which varies in magnitude for 
separate points of time or location or other characteristic or base. 

VARIANCE = the square of the standard deviation, cr^. 

W 

W = symbol of weight. 

WEIGHTED ARITHMETIC MEAN = a mean in the computation of 
which the several items are included as many times as there are 
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units in another number called the weight. The weights indicate 
the relative importance of the items. 

X 

X = an item of data. 

X = symbol of arithmetic mean. 

X - symbol of deviation from the mean. 

x' = symbol of deviation from an arbitrary origin. 

a;- AXIS = the horizontal base of a chart or graph. 

Y 

F = symbol of an item of data. 

Y == symbol of mean of data. 
y = symbol of deviation from arithmetic mean. 
y' ~ symbol of deviation from arbitrary origin. 

2/-AXIS = the vertical base of a chart or graph. 



APPENDIX TABLE I 

I. Ordinates of the Normal Probability Curve 

Expressed as fractional parts of the mean ordinate yo. Each ordinate is erected at 
given distance from the mean. The height of the ordinate erected at the mean can b 
computed from, ^ ^ 

^ (T^ 2Tr 2.5o66cr 

The corresponding height of any other ordinate can be read from the table by assigning 
the distance that the ordinate is from the mean (x). Distances on x are measured ai 
fractional parts of c. Thus the height of an ordinate at a distance from the mean o 
.ycT will be .78270yo; the height of an ordinate at 2.150* from the mean will be .099143^0 
etc. 


jc/o* 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

0.0 

100000 

99995 

99980 

99955 

99920 

99875 

99820 

99755 

99685 

99596 

0.1 

99501 

99396 

99383 

99158 

99025 

98881 

98728 

98565 

98393 

98211 

0.2 

98020 

97819 

97609 

97390 

97161 

96923 

96676 

96420 

96156 

95882 

0.3 

95600 

95309 

95010 

94702 

94387 

94055 

93723 

93382 

93024 

92677 

0.4 

92312 

91399 

91558 

91169 

90774 

90371 

89961 

89543 

89119 

88688 

0.5 

88250 

87805 

87353 

86896 

86432 

85962 

85488 

85006 

84519 

84060 

0.6 

83527 

83023 

82514 

82010 

81481 

80957 

80429 

79896 

79359 

78817 

0.7 

78270 

77721 

77167 

76610 

76048 

75484 

74916 

74342 

73769 

73193 

0.8 

72615 

72033 

71448 

70861 

70272 

69681 

69087 

68493 

67896 

67298 

0.9 

66689 

66097 

65494 

64891 

64287 

63683 

63077 

62472 

61865 

61259 

1.0 

60653 

60047 

59440 
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^ The figures in the body of the table are values of r corresponding to s-values read 
from the scales on the left and top of the table. 
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.9762 

.9767 

.9771 

.9776 

.9780 

•9785 

■9789 

•9793 

•9797 

2.3 

.9801 

.9805 

.9809 

.9812 

.9816 

.9820 

.9823 

,9827 

.9830 

•9834 

2.4 

•9837 

.9840 

.9843 

.9846 

.9849 

.9852 

■9855 

.9858 

.9861 

.9863 

2.5 

.9866 

.9869 

.9871 

.9874 

.9876 

.9879 

.9881 

.9884 

.9886 

.9888 

2.6 

.9890 

.9892 

•9895 

.9897 

.9899 

.9901 

■9903 

-9905 

.9906 

.9908 

2.7 

.9910 

.9912 

.9914 

•9915 

.9917 

.9919 

.9920 

.9922 

.9923 

•9925 

2.8 

.9926 

.9928 

.9929 

-9931 

•9932 

•9933 

-9935 

■9936 

•9937 

■9938 

2.9 

.9940 

.9941 

.9942 

•9943 

-9944 

•9945 

.9946 

•9947 

•9949 

•9950 

3.0 

•9951 





4.0 

•9993 










5.0 

.9999 











1 The figures in the body of the table are values of r corresponding to s-values read 
from the scales on the left and top of the table. 





APPENDIX TABLE IV 


IV. Squares^ Square Roots, and Reciprocals to 1000^ 


No. 

Square 

Square Boot 

Reciprocal 

XlOO 


No. 

Square 

Square Boot 

Reciprocal 

XlOO 

1 

1 

1.0000000 

100.0000000 


51 

26 01 

7.1414284 

1.9607843 

2 

4 

1.4142136 

50.0000000 


52 

27 04 

7.2111026 

1.9230769 

3 

9 

1.7320508 

33.3333333 


53 

28 09 

7 2801099 

1.8867925 

4 

16 

2.0000000 

25.0000000 


54 

29 16 

7.3484692 

1.8518619 

5 

25 

2.2360680 

20.0000000 


55 

30 25 

7.4161985 

1.8181818 

6 

36 

2.4494897 

16.6666667 


56 

31 36 

7.4833148 

1.7857143 

7 

49 

2.6457513 

14.2857143 


57 

32 49 

7.5498344 

1.7543860 

8 

64 

2.8284271 

12.5000000 


58 

33 64 

7.6157731 

1.7241379 

9 

81 

3.0000000 

11.1111111 


59 

34 81 

7.6811457 

1.6949153 

10 

1 00 

3.1622777 

10.0000000 


60 

36 00 

7.7459667 

1.6666667 

11 

121 

3.3166248 

9.0909091 


61 

37 21 

7.8102497 

1.6393443 

12 

144 

3.4641016 

8.3333333 


62 

38 44 

7.8740079 

1.6129032 

13 

1 69 

3.6055513 

7.6923077 


63 

39 69 

7. -9372539 

1.5873016 

14 

1 96 

3.7416574 

7.1428571 


64 

40 96 

8.0000000 

1.5625000 

15 

2 25 

3.8729833 

6.6666667 


65 

42 25 

8.0622577 

1.5384615 

16 

2 56 

4.0000000 

6.2500000 


66 

43 56 

8.1240384 

1.6151615 

17 

2 89 

4.1231056 

5.8823529 


67 

44 89 

8. 1853528 

1.4925373 

18 

3 24 

4.2426407 

5.5555556 


68 

46 24 

8.2462113 

1.4705882 

19 

3 61 

4.3588989 

5.2631579 


69 

47 61 

8.3066239 

1.4492754 

20 

4 00 

4.4721360 

5.0000000 


70 

49 00 

8.3666003 

1.4285714 

21 

4 41 

4,5825757 

4.7619048 


71 

50 41 

8.4261498 

1.4084507 

22 

4 84 

4,6904158 

4.5454545 


72 

51 84 

8.4852814 

1.3888889 

23 

5 29 

4.7958315 

4.3478261 


73 

53 29 

8.5440037 

1.3698630 

24 

5 76 

4.8989795 

4.1666667 


74 

54 76 

8.6023253 

1.3513514 

25 

6 25 

5.0000000 

4.0000000 


75 

56 25 

8.6602540 

1.3333333 

26 

6 76 

5.0990195 

3.8461538 


76 

57 76 

8.7177979 

1.3157895 

27 

7 29 

5.1961524 

3.7037037 


77 

59 29 

8.7749644 

1.2987013 

28 

7 84 

5.2915026 

3.5714286 


78 

60 84 

8.8317609 

1.2820513 

29 

8 41 

5.3851648 

3.4482759 


79 

62 41 

8.8881944 

1.2658228 

30 

9 00 

5.4772256 

3.3333333 


80 

64 00 

8.9442719 

1.2500000 

31 

9 61 

5.5677644 

3.2258065 


81 

65 61 

9.0000000 

1.2345679 

32 

10 24 

5.6568542 

3,1250000 


82 

67 24 

9.0553851 

1.2195122 

33 

10 89 

5.7445626 

3.0303030 


83 

68 89 

9.1104336 

1.2048193 

34 

11 56 

5.8309519 

2.9411765 


84 

70 56 

! 9.1651514 

1.1904762 

35 

12 25 

5.9160798 

2.8571429 


85 

72 25 

1 9.2195445 

1.1764706 

36 

12 96 

6.0000000 

2.7777778 


86 

73 96 

9.2736185 

1.1627907 

37 

13 69 

6.0827625 

2.7027027 


87 

75 69 

9.3273791 

1.1494253 

38 

14 44 

6.1644140 

2.6315789 


88 

77 44 

9.3808315 

1.1363636 

39 

15 21 

6.2449980 

2.5641026 


89 

79 21 

9.4339811 

1.1235955 

40 

16 00 

6.3245553 

2.5000000 


90 

81 00 

9.4868330 

1.1111111 

41 

16 81 

6.4031242 

2.4390244 


91 

82 81 

9-5393920 

1.0989011 

42 

17 64 

6.4807407 

2.3809524 


92 

84 64 

9.5916630 

1.0869565 

43 

18 49 

6.5574385 

2.3255814 


93 

86 49 

9.6436508 

1.0752688 

44 

19 36 

6.6332496 

2.2727273 


94 

88 36 

9.6953597 

1.0638298 

45 

20 25 

6.7082039 

2.2222222 


95 

90 25 

9.7467943 

1.0526316 

46 

21 16 

6.7823300 

2.1739130 


96 

92 16 

9.7979590 

1.0416667 

47 

22 09 

6.8556546 

2.1276596 


97 

94 09 

9.8488578 

1.0309278 

48 

23 04 

6 9282032 

2.0833333 


98 

96 04 

9.8994949 

1.0204082 

49 

24 01 

7.0000000 

2.0408163 


99 

98 01 

9.9498744 

1.0101010 

50 

25 00 

7.0710678 

2.0000000 


100 

1 00 00 

10.0000000 

1.0000000 


1 Reprinted by permission from Business Statistics by Davies and Yoder, published 
by John Wiley & Sons, Inc. 


~ 652 ~- 





Squares, Square Roots, and Reciprocals to 1000 (Cont.) 


Square Square Eoot 


Square Square Root 


1 02 01 10 0498756 9900990 
1 04 04 10.0995049 9803922 
1 06 09 10.1488916 9708738 
1 08 16 10.1980390 9615385 
1 10 25 10.2469508 9523810 


2 28 01 
2 31 04 
2 34 09 
2 37 16 
2 40 25 


12.2882057 

12.3288280 

12,3693169 

12.4096736 

12.4498996 


6622517 

6578947 

6535948 

6493506 

6451613 


1 12 36 10.2956301 9433962 
1 14 49 10.3440804 9345794 
1 16 64 10.3923048 9259259 
1 18 81 10.4403065 9174312 
1 21 00 10,4880885 9090909 I 


2 43 36 
2 46 49 
2 49 64 
2 52 81 
2 56 00 I 


12.4899960 
12 5299641 
12.5698051 
12.6095202 
12.6491106 


6410256 

6369427 

6329114 

6289308 

6250000 


1 23 21 10.5356538 
1 25 44 10.5830052 
1 27 69 10.6301458 
1 29 96 10 6770783 
1 32 25 10.7238053 

1 34 56 10.7703296 
1 36 89 10.8166538 
1 39 24 10.8627805 
1 41 61 10.9087121 
1 44 00 10.9544512 


1 46 41 11.0000000 
1 48 84 11.0453610 
1 51 29 11,0905365 
1 53 76 11.1355287 
1 56 25 11.1803399 

1 58 76 11.2249722 
1 61 29 11.2694277 
1 63 84 11.3137085 
1 66 41 11.3578167 
1 69 00 11.4017543 

1 71 61 11.4455231 
1 74 24 11.4891253 
1 76 89 11.5325626 
1 79 56 11.5758369 
1 82 25 11.6189500 

1 84 96 11.6619038 
1 87 69 11.7046999 
1 90 44 11.7473401 
1 93 21 11.7898261 
1 96 00 11.8321596 

1 98 81 11.8743421 

2 01 64 11.9163753 
2 04 49 11.9582607 

12 07 36 12.0000000 
2 10 25 12.0415946 

2 13 16 12.0830460 
2 16 09 12.1243557 
2 19 04 12.1655251 
2 22 01 12.2065556 
2 25 00 12.2474487 


9009009 

8928571 

8849558 

8771930 

8695652 

8620690 

8547009 

8474576 

8403361 

8333333 

8264463 

8196721 

8130081 

8064516 

8000000 

7936508 

7874016 

7812500 

7751938 

7692308 

7633588 
7575758 
I 7518797 
7462687 
7407407 

7352941 

7299270 

7246377 

7194245 

7142857 

7092199 

7042254 

6993007 

6944444 

6896552 

6849315 

6802721 

6756757 

6711409 

6666667 


2 59 21 
2 62 44 
2 65 69 
2 68 96 
2 72 25 

2 75 56 
2 78 89 
2 82 24 
2 85 61 
2 89 00 

2 92 41 
2 95 84 

2 99 29 

3 02 76 
3 06 25 

3 09 76 
3 13 29 
3 16 84 
3 20 41 
3 24 00 

I 3 27 61 
3 31 24 
3 34 89 
3 38 56 
3 42 25 


12.6885775 

12.7279221 

12.7671453 

12.8062485 

12.8452326 

12.8840987 

12.9228480 

12.9614814 

13.0000000 

13.0384048 

13.0766968 

13.1148770 

13.1529464 

13.1909060 

13.2287566 

13.2664992 

13.3041347 

13.3416641 

13.3790882 

13.4164079 

13.4536240 

13.4907376 

13.5277493 

13.5646600 

13.6014705 


3 45 96 13.6381817 
3 49 69 13,6747943 
3 53 44 13.7113092 
3 57 21 13.7477271 
3 61 00 13.7840488 


3 64 81 
3 68 64 
3 72 49 
3 76 36 
3 80 25 


13.8202750 
13.8564065 
13.8924440 
13 9283883 
13.9642400 


3 84 16 14.0000000 
3 88 09 14.0356688 
3 92 04 14.0712473 

3 96 01 14.1067360 

4 00 00 14.1421356 


6211180 

6172840 

6134969 

6097561 

6060606 

6024096 

5988024 

5952381 

5917160 

5882353 

5847953 

5813953 

5780347 

5747126 

5714286 

5681818 

5649718 

5617978 

5586592 

5555556 

5524862 

5494505 

5464481 

5434783 

5405405 

5376344 

5347594 

5319149 

5291005 

5263158 

5235602 

5208333 

5181347 

5154639 

5128205 

5102041 

5076142 

5050505 

5025126 

5000000 





IV. Squares, Square Roots, and Reciprocals to 1000 (Cont.) 


No. Square Square Root 

201 4 04 01 14.1774469 4975124 

202 4 08 04 14.2126704 4950495 

203 4 12 09 14.2478068 4926108 

204 4 16 16 14.2828569 4901961 

205 4 20 25 14.3178211 4878049 

206 4 24 36 14.3527001 4854369 

207 4 28 49 14.3874946 4830918 

208 4 32 64 14.4222051 4807692 

209 4 36 81 14.4568323 4784689 

210 4 4100 14.4913767 4761905 

211 4 45 21 14.5258390 4739336 

212 4 49 44 14.5602198 4716981 

213 4 53 69 14.5945195 4694836 

214 4 57 96 14.6287388 4672897 

215 4 62 25 14.6628783 4651163 

216 4 66 56 14.6969385 4629630 

217 4 70 89 14.7309199 4608295 

218 4 75 24 14.7648231 4587156 

219 4 79 61 14.7986486 4566210 

220 4 84 00 14.8323970 4545455 

221 4 88 41 14,8660687 4524887 

222 4 92 84 14.8996644 4504505 

223 4 97 29 14.9331845 4484305 

224 5 01 76 14.9666295 4464286 

225 5 06 25 15.0000000 4444444 

226 5 10 76 15.0332964 4424779 

227 5 15 29 15.0665192 4405286 

228 5 19 84 15.0996689 4385965 

229 5 24 41 15.1327460 4366812 

230 5 29 00 15.1657509 4347826 

231 5 33 61 15.1986842 4329004 

232 5 38 24 15.2315462 4310345 

233 5 42 89 15.2643375 4291845 

234 5 47 56 15.2970585 4273504 

235 5 52 25 15.3297097 4255319 

236 5 56 96 15.3622915 4237288 

237 5 61 69 15.3948043 4219409 

238 5 66 44 15.4272486 4201681 

239 5 71 21 15.4596248 4184100 

240 5 76 00 15.4919334 4166667 

241 5 80 81 15.5241747 4149378 

242 5 85 64 15.5563492 4432231 

243 5 90 49 15.5884573 4115226 

244 5 95 36 15.6204994 4098361 

245 6 00 25 15.6524758 4081633 

246 6 05 16 15.6843871 4065041 

247 6 10 09 15.7162336 4048583 

248 6 15 04 15.7480157 4032258 

249 6 20 01 15.7797338 4016064 

250 6 25 00 15.8113883 4000000 


No. 

Square 

Square Root 

Reciprocal 
X 10® 

251 

6 30 01 

15 8429795 

3984064 

252 

6 35 04 

15.8745079 

3968254 

253 

6 40 09 

15.9059737 

3952569 

254 

6 45 16 

15.9373775 

3937008 

255 

6 50 25 

15.9687194 

3921569 

256 

6 55 36 

16.0000000 

3906250 

257 

6 60 49 

16.0312195 

3891051 

258 

6 65 64 

16.0623784 

3875969 

259 

6 70 81 

16.0934769 

3861004 

260 

6 76 00 

16.1245155 

3846154 

261 

6 81 21 

16.1554944 

3831418 

262 

6 86 44 

16.1864141 

3816794 

263 

6 91 69 

16.2172747 

3802281 

264 

6 96 96 

a6. 2480768 

3787879 

265 

7 02 25 

16.2788206 

3773585 

266 

7 07 56 

16.3095064 

3759398 

267 

7 12 89 

16.3401346 

3745318 

268 

718 24 

16.3707055 

3731343 

269 

7 23 61 

16.4012195 

3717472 

270 

7 29 00 

16.4316767 

3703704 

271 

7 34 41 

16.4620776 

3690037 

272 

7 39 84 

16.4924225 

3676471 

273 

7 45 29 

16.5227116 

3663004 

274 

7 50 76 

16.5529454 

3649635 

275 

7 56 25 

16.5831240 

3636364 

276 

7 61 76 

16.6132477 

3623188 

277 

7 67 29 

16.6433170 

3610108 

278 

7 72 84 

16.6733320 

3597122 

279 

7 78 41 

16.7032931 

3584229 

280 

7 84 00 

16.7332005 

3571429 

281 

7 89 61 

16.7630546 

3558719 

282 

7 95 24 

16.7928556 

3546099 

283 

8 00 89 

16.8226038 

3533569 

284 

8 06 56 

16.8522995 

3521127 

285 

8 12 25 

16.8819430 

3508772 

286 

8 17 96 

16.9115345 

3496503 

287 

8 23 69 

16.9410743 

3484321 

288 

8 29 44 

16 9705627 

3472222 

289 

8 35 21 

17.0000000 

3460208 

290 

8 41 00 

17.0293864 

3448276 

291 

8 46 81 

17.0587221 

3436426 

292 

8 52 64 

17.0880075 

3424658 

293 

8 58 49 

17.1172428 

3412969 

294 

8 64 36 

17.1464282 

3401361 

295 

8 70 25 

17.1755640 

3389831 

296 

8 76 16 

17.2046505 

3378378 

297 

8 82 09 

17,2336879 

3367003 

298 

8 88 04 

17.2626765 

3355705 

299 

8 94 01 

17.2916165 

3344482 

300 

9 00 00 

17.3205081 

3333333 
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Squares, Square Roots, and Reciprocals to 1000 (Cont.) 


No. 

Square 

Square Root 

Recipro- 
cal X 10® 

301 

9 06 01 

17 3493516 

3322259 

302 

9 12 04 

17.3781472 

3311258 

303 

9 18 09 

17.4068952 

3300330 

304 

9 24 16 

17.4355958 

3289474 

305 

9 30 25 

17.4642492 

3278689 

306 

9 36 36 

17.4928557 

3267974 

307 

9 42 49 

17.5214155 

3257329 

308 

9 48 64 

17.5499288 

3246753 

309 

9 54 81 

17.5783958 

3236246 

310 

9 61 00 

17.6068169 

3225806 

311 

9 67 21 

17.6351921 

3215434 

312 

9 73 44 

17.6635217 

3205128 

313 

9 79 69 

17.6918060 

3194888 

314 

9 85 96 

17.7200451 

3184713 

315 

9 92 25 

17.7482393 

3174603 

316 

9 98 56 

17.7763888 

3164557 

317 

10 04 89 

17.8044938 

3154574 

318 

10 11 24 

17.8325545 

3144654 

319 

10 17 61 

17.8605711 

3134796 

320 

10 24 00 

17.8885438 

3125000 

321 

10 30 41 

17.9164729 

3115265 

322 

10 36 84 

17.9443584 

3105590 

323 

10 43 29 

17.9722008 

3095975 

324 

10 49 76 

18.0000000 

3086420 

325 

10 56 25 

18.0277564 

3076923 

326 

10 62 76 

18.0554701 

3067485 

327 

10 69 29 

18.0831413 

3058104 

328 

10 75 84 

18.1107703 

3048780 

329 

10 82 41 

18.1383571 

3039514 

330 

10 89 00 

18.1659021 

3030303 

331 

10 95 61 

18.1934054 

3021148 

332 

11 02 24 

18.2208672 

3012048 

333 

11 08 89 

18.2482876 

3003003 

334 

11 15 56 

18.2756669 

2994012 

335 

11 22 25 

18.3030052 

2985075 

336 

11 28 96 

18.3303028 

2976190 

337 

11 35 69 

18.3575598 

2967359 

338 

11 42 44 

18.3847763 

2958580 

339 

11 49 21 

18.4119526 

2949853 

340 

11 56 00 

18.4390889 

2941176 

341 

11 62 81 

18.4661853 

2932551 

342 

11 69 64 

18.4932420 

2923977 

343 

11 76 49 

18.5202592 

2915452 

344 

11 83 36 

18.5472370 

2906977 

345 

11 90 25 

18.5741756 

2898551 

346 

11 97 16 

18.6010752 

2890173 

347 

12 04 09 

18.6279360 

2881844 

348 

12 11 04 

18.6547581 

2873563 

349 

12 18 01 

18 6815417 

2865330 

350 

12 25 00 

18.7082869 

2857143 


No. Square Square Root c^f 

351 12 32 01 18.7349940 2849003 

352 12 39 04 18.7616630 2840909 1 

353 12 46 09 18.7882942 2832861 1 

354 12 S3 16 18.8148877 2824859 

355 12 60 25 18.8414437 2816901 

356 12 67 36 18.8679623 2808989 

357 12 74 49 18.8944436 2801120 

358 12 81 64 18.9208879 2793296 

359 12 88 81 18.9472953 2785515 

360 12 96 00 18.9736660 2777778 

361 13 03 21 19.0000000 2770083 

362 13 10 44 19.0262976 2762431 

363 13 17 69 19.0525589 2754821 

364 13 24 96 19.0787840 2747253 

365 13 32 25 19.1049732 2739726 

366 13 39 56 19.1311265 2732240 

367 13 46 89 19.1572441 2724796 

368 13 54 24 19.1833261 2717391 

369 13 61 61 19.2093727 2710027 

370 13 69 00 19.2353841 2702703 

371 13 76 41 19.2613603 2695418 

372 13 83 84 19.2873015 2688172 

373 13 91 29 19.3132079 2680965 

374 13 98 76 19.3390796 2673797 

375 14 06 25 19.3649167 2666667 

376 14 13 76 19.3907194 2659574 

377 14 21 29 19.4164878 2652520 

378 14 28 84 19.4422221 2645503 

379 14 36 41 19.4679223 2638522 

380 14 44 00 19.4935887 2631579 

381 14 51 61 19.5192213 2624672 

382 14 59 24 19.5448203 2617801 

383 14 66 89 19 5703858 2610966 

384 14 74 56 19.5959179 2604167 

385 14 82 25 19.6214169 2597403 

386 14 89 96 19 6468827 2590674 

387 14 97 69 19 6723156 2583979 

388 15 05 44 19.6977166 2577320 

389 15 13 21 19.7230829 2570694 

390 15 21 00 19.7484177 2564103 

391 15 28 81 19.7737199 2557545 

392 16 36 64 19 7989899 2551020 

393 15 44 49 19.8242276 2544529 

394 15 52 36 19.8494332 2538071 

395 15 60 25 19.8746069 2531646 

396 15 68 16 19.8997487 2525253 

397 15 76 09 19.9248588 2518892 

398 15 84 04 19.9499373 2512563 

399 15 92 01 19.9749844 2506266 

400 16 00 00 20.0000000 2500000 
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IV . Squares, Square Roots, and Reciprocals to 1000 (Cont^) 


401 16 08 01 

402 16 16 04 

403 16 24 09 

404 16 32 16 

405 16 40 25 

406 16 48 36 

407 16 56 49 

408 16 64 64 

409 16 72 81 

410 16 81 00 

411 16 89 21 

412 16 97 44 

413 17 05 69 

414 17 13 96 

415 17 22 25 

416 17 30 56 

417 17 38 89 

418 17 47 24 

419 17 55 61 

420 17 64 00 

421 17 72 41 

422 17 80 84 

423 17 89 29 

424 17 97 76 

425 18 06 25 

426 18 14 76 

427 18 23 29 

428 18 31 84 

429 18 40 41 

430 18 49 00 

431 18 57 61 

432 18 66 24 

433 18 74 89 

434 18 83 56 

435 18 92 25 

436 19 00 96 

437 19 09 69 

438 19 18 44 

439 19 27 21 

440 19 36 00 

441 19 44 81 

442 19 53 64 

443 19 62 49 

444 19 71 36 

445 19 80 25 

446 19 89 16 

447 19 98 09 

448 20 07 04 

449 20 16 01 

450 20 25 00 


20.0249844 2493766 
20.0499377 2487562 
20.0748599 2481390 
20.0997512 2475248 
20.1246118 2469136 

20.1494417 2463054 
20.1742410 2457002 
20.1990099 2450980 
20.2237484 2444988 
20.2484567 2439024 

20.2731349 2433090 
20.2977831 2427184 
20.3224014 2421308 
20.3469899 2415459 
20.3715488 2409639 

20.3960781 2403846 
20.4205779 2398082 
20.4450483 2392344 
20.4694895 2386635 
20.4939015 2380952 

20.5182845 2375297 
20.5426386 2369668 
20.5669638 2364066 
20.5912603 23584911 
20.6155281 2352941 

20.6397674 2347418 
20.6639783 2341920 
20.6881609 2336449 
20.7123152 2331002 
20.7364414 2325581 

20.7605395 2320186 
20.7846097 2314815 
20.8086520 2309469 
20.8326667 2304147 
20.8566536 2298851 

20.8806130 2293578 
20.9045450 2288330 
20.9284495 2283105 
20.9523268 2277904 
20.9761770 2272727 

21.0000000 2267574 
21.0237960 2262443 
21.0475652 2257336 
21.0713075 2252252 
21.0950231 2247191 

21.1187121 22421521 
21.1423745 22371361 
21.1660105 2232143 
21.1896201 2227171 
21.2132034 2222222 


No. 

Square 

Square Hoot 

Recipro- 
cal X 103 

451 

20 34 01 

21.2367606 

2217295 

452 

20 43 04 

21.2602916 

2212389 

453 

20 52 09 

21.2837967 

2207 S 06 

454 

20 61 16 

21.3072758 

2202643 

455 

20 70 25 

21.3307290 

2197802 

456 

20 79 36 

21.3541565 

2192982 

457 

20 88 49 

21.3775583 

2188184 

458 

20 97 64 

21.4009346 

2183406 

459 

21 06 81 

21.4242853 

2178649 

460 

21 16 00 

21.4476106 

2173913 

461 

21 25 21 

21.4709106 

2169197 

462 

21 34 44 

21.4941853 

2164502 

463 

21 43 69 

21.5174348 

2159827 

464 

21 52 96 

21.5406592 

2156172 

465 

21 62 25 

21.5638587 

2160538 

466 

21 71 56 

2 i !5870331 

2145923 

467 

21 80 89 

21.6101828 

2141328 

468 

21 90 24 

21.6333077 

2136752 

469 

21 99 61 

21.6564078 

2132196 

470 

22 09 00 

21.6794834 

2127660 

471 

22 18 41 

21.7025344 

2123142 

472 

22 27 84 

21.7255610 

2118644 

473 

22 37 29 

21.7485632 

2114165 

474 

22 46 76 

21.7715411 

2109705 

475 

22 56 25 

21.7944947 

2105263 

476 

22 65 76 

21.8174242 

2100840 

477 

22 75 29 

21.8403297 

2096436 

478 

22 84 84 

21.8632111 

2092050 

479 

22 94 41 

21.8860686 

2087683 

480 

23 04 00 

21.9089023 

2083333 

481 

23 13 61 

21.9317122 

2079002 

482 

23 23 24 

21.9544984 

2074689 

483 

23 32 89 

21.9772610 

2070393 

484 

23 42 56 

22.0000000 

2066116 

485 

23 52 25 

22.0227155 

2061856 

486 

23 61 96 

22.0454077 

2057613 

487 

23 71 69 

22.0680765 

2053388 

488 

23 81 44 

22.0907220 

2049180 

489 

23 91 21 

22.1133444 

2044990 

490 

24 01 00 

22.1359436 

2040816 

491 

24 10 81 

22.1585198 

2036660 

492 

24 20 64 

22.1810730 

2032520 

493 

24 30 49 

22 2036033 

2028398 

494 

24 40 36 

22.2261108 

2024291 

495 

24 50 25 

22.2485955 

2020202 

496 

24 60 16 

22.2710575 

2016129 

497 

24 70 09 

22.2934968 

2012072 

498 

24 80 04 

22.3159136 

2008032 

499 

24 90 01 

22.3383079 

2004008 

500 

25 00 00 

22.3606798 

2000000 
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IV. Squares, Square Roots, and Reciprocals to 1000 {ConU) 


No. Square Square Boot c^f xTo® 

501 25 10 01 22.3830293 1996008 

502 25 20 04 22.4053565 1992032 

503 25 30 09 22.4276615 1988072 

504 25 40 16 22.449,9443 1984127 

505 25 50 25 22.4722051 1980198 

506 25 60 36 22.4944438 1976285 

507 25 70 49 22.5166605 19723871 

508 25 80 64 22.5388553 1968504 

509 25 90 81 22.5610283 1964637 

510 26 01 00 22.5831796 1960784 

511 26 11 21 22.6053091 1956947 

512 26 21 44 22.6274170 1953125 

513 26 31 69 22.6495033 1949318 

514 26 41 96 22.6715681 1945525 

515 26 52 25 22.69’36114 1941748 


26 62 56 
26 72 89 
26 83 24 

26 93 61 

27 04 00 

27 14 41 
27 24 84 
27 35 29 
27 45 76 
27 56 25 

27 66 76 
27 77 29 
27 87 84 

27 98 41 

28 09 00 

28 19 61 
28 30 24 
28 40 89 
28 51 56 
28 62 25 
I 

28 72 96 
28 83 69 

28 94 44 

29 05 21 
29 16 00 

29 26 81 
29 37 64 
29 48 49 
29 59 36 
29 70 25 

29 81 16 

29 92 09 

30 03 04 
30 14 01 
30 25 00 


22.7156334 1937984 
22.7376340 1934236 
22.7596134 1930502 
22.7815715 1926782 
22.8035085 1923077 


22.8254244 

22.8473193 

22.8691933 

22.8910463 

22.9128785 

22.9346899 
22.9564806 
22 9782506 
23.0000000 
23.0217289 

23.0434372 

23.0651252 

23.0867928 

23.1084400 

23.1300670 

23.1516738 

23.1732605 

23,1948270 

23.2163735 

23.2379001 

23.2594067 

23.2808935 

23.3023604 

23.3238076 

23.3452351 

23.3666429 

23.3880311 

23.4093998 

23.4307490 

23.4520788 


1919386 

19157091 

1912046 

1908397 

1904762 

1901141 

1897533 

1893939 

1890359 

1886792 

1883239 

1879699 

1876173 

1872659 

1869159 

1865672 

1862197 

1858736 

1855288 

1851852 

1848429 

1845018 

1841621 

1838235 

1834862 

1831502 

1828154 

1824818 

1821494 

1818182 


No. 

Square 

Square Rool 

Recipro- 
cal X 10® 

551 

30 36 01 

23.4733892 

1814882 

552 

30 47 04 

23.4946802 

1811594 

553 

30 58 09 

23.5159520 

1808318 

554 

30 69 16 

23.5372046 

1805054 

555 

30 80 25 

23.5584380 

1801802 

556 

30 91 36 

23.5796522 

1798561 

557 

31 02 49 

23.6008474 

1795332 

558 

31 13 64 

23.6220236 

1792115 

559 

31 24 81 

23.6431808 

1788909 

560 

31 36 00 

23.6643191 

1785714 

561 

31 47 21 

23.6854386 

1782531 

562 

31 58 44 

23.7065392 

1779359 

563 

31 69 69 

23.7276210 

1776199 

564 

31 80 96 

23.7486842 

1773050 

565 

31 92 25 

23.7697286 

1769912 

566 

32 03 56 

23.7907545 

1766784 

567 

32 14 89 

23.8117618 

1763668 

568 

32 26 24 

23.8327506 

1760563 

569 

32 37 61 

23.8537209 

1757469 

570 

32 49 00 

23.8746728 

1754386 

571 

S2'60 41 

23.8956063 

1751313 

572 

32 71 84 

23.9165215 

1748252 

573 

32 83 29 

23.9374184 

1745201 

574 

32 94 76 

23.9582971 

1742160 

575 

33 06 25 

23.9791576 

1739130 

576 

33 17 76 

24.0000000 

1736111 

577 

33 29 29 

24.0208243 

1733102 

578 

33 40 84 

24.0416306 

1730104 

579 

33 52 41 

24.0624188 

1727116 

580 

33 64 00 

24.0831892 

1724138 

581 

33 75 61 

24.1039416 

1721170 

582 

33 87 24 

24.1246762 

1718213 

583 

33 98 89 

24.1453929 

1715266 

584 

34 10 56 

24.1660919 

1712329 

585 

34 22 25 

24.1867732 

1709402 

586 

34 33 96 

24.2074369 

1706485 

587 

34 45 69 

24.2280829 

1703578 

588 

34 57 44 

24.2487113 

1700680 

589 

34 69 21 

24.2693222 

1697793 

590 

34 81 00 

24.2899156 

1694915 

591 

34 92 81 

24.3104916 

1692047 

592 

35 04 64 

24.3310501 

1689189 

593 

35 16 49 

24.3515913 

1686341 

594 

35 28 36 

24.3721152 

1683502 

595 

35 40 25 

24.3926218 

1680672 

596 

35 52 16 

24.4131112 

1677852 

597 

35 64 09 

24.4335834 

1675042 

598 

35 76 04 

24.4540385 

1672241 

599 

35 88 01 

24.4744765 

1669449 

600 

36 00 00 

24.4948974 

1666667 


— 657 — 




IV. Squares, Square Roots, and Reciprocals to 1000 (ConL.) 


No. 

Square 

Square Root 

Recipro- 
cal X IQS 


No. 

Square 

Square Root 

Recipro- 
cal X 109 

601 

36 12 01 

24.5153013 

1663894 


651 

42 38 01 

25.5147016 

1536098 

602 

36 24 04 

24.5356883 

1661130 


652 

42 51 04 

25.5342907 

1533742 

603 

36 36 09 

24.5560583 

1658375 


653 

42 64 09 

25.5538647 

1531394 

604 

36 48 16 

24.5764115 

1655629 


654 

42 77 16 

25.5734237 

1529052 

605 

36 60 25 

24.5967478 

1652893 


655 

42 90 25 

25.5929678 

1526718 

606 

36 72 36 

24.6170673 

1650165 


656 

43 03 36 

25.6124969 

1524390 

607 

36 84 49 

24.6373700 

1647446 


657 

43 16 49 

25.6320112 

1522070 

608 

36 96 64 

24.6576560 

1644737 


658 

43 29 64 

25.6515107 

1519757 

609 

37 08 81 

24.6779254 

1642036 


659 

43 42 81 

25.6709953 

1517451 

610 

37 21 00 

24.6981781 

1639344 


660 

43 56 00 

25.6904652 

1515152 

611 

37 33 21 

24.7184142 

1636661 


tiux 

43 69 21 

25.7099203 

1512859 

612 

37 45 44 

24.7386338 

1633987 


6«2 

43 82 44 

25.7293607 

1510574 

613 

37 57 69 

24.7588368 

1631321 



43 95 69 

25.7487864 

1508296 

614 

37 69 96 

24.7790234 

1628664 



44 08 96 

25.7681975 

1506024 

615 

37 82 25 

24.7991935 

1626016 


t 

44 22 25 ' 

25.7875939 

1503759 

616 

37 94 56 

24.8193473 

1623377 


r' 

44 35 56 

25.8069758 

1501502 

617 

38 06 89 

24.8394847 

1620746 


667 

44 48 89 

25.8263431 

1499250 

618 

38 19 24 

24.8596058 

1618123 



44 62 24 

25.8456960 

1497006 

619 

38 31 61 

24.8797106 

1615509 



44 75 61 

•25.8650343 

1494768 

620 

38 44 00 

24.8997992 

1612903 


670 

44 89 00 

25.8843582 

1492537 

621 

38 56 41 

24.9198716 

1610306 


671 

45 02 41 

25.9036677 

1490313 

622 

38 68 84 

24.9399278 

1607717 


672 

45 15 84 

25.9229628 

1488095 

623 

38 81 29 

24.9599679 

1605136 


673 

45 29 29 

25.9422435 

1485884 

624 

38 93 76 

24.9799920 

1602564 


674 

45 42 76 

25.9615100 

1483680 

625 

39 06 25 

i 25,0000000 

1600000 


675 

45 56 25 

25.9807621 

1481481 

626 

39 18 76 

! 25.0199920 

1597444 


676 

45 69 76 

26.0000000 

1479290 

627 

39 31 29 

25.0399681 

1594896 


677 

45 83 29 

26.0192237 

1477105 

628 

39 43 84 

25.0599282 

1592357 


678 

45 96 84 

26.0384331 

1474926 

629 

39 56 41 

25.0798724 

1589825 


679 

46 10 41 

26.0576284 

1472754 

630 

39 69 00 

25.0998008 

1587302 


680 

46 24 00 

26.0768096 

1470588 

631 

39 81 61 

25.1197134 

1584786 


681 

46 37 61 

26.0959767 

1468429 

632 

39 94 24 

25.1396102 

1582278 


682 

46 51 24 

26.1151297 

1466276 

633 

40 06 89 

25,1594913 

1579779 


683 

46 64 89 

26.1342687 

1464129 

634 

40 19 56 

25.1793566 

1577287 


684 

46 78 56 

26.1533937 

1461988 

635 

40 32 25 

25.1992063 

1574803 


685 

46 92 25 

26.1725047 

1459854 

636 

40 44 96 

25.2190404 

1572327 


686 

47 05 96 

26.1916017 

1457726 

637 

40 57 69 

25.2388589 

1569859 


! 687 

47 19 69 

26.2106848 

1455604 

638 

40 70 44 

25-2586619 

1567398 


688 

47 33 44 

26.2297541 

1453488 

639 

40 83 21 

25.2784493 

1564945 


689 

47 47 21 

26.2488095 

1451379 

640 

40 96 00 

25.2982213 

1562500 


690 

47 61 00 

26.2678511 

1449275 

641 

41 08 81 

25.3179778 

1560062 


691 

47 74 81 

26.2868789 

1447178 

642 

41 21 64 

25.3377189 

1557632 


692 

47 88 64 

26.3058929 

1445087 

643 

41 34 49 

25.3574447 

1555210 


693 

48 02 49 

26.3248932 

1443001 

644 

41 47 36 

25.3771551 

1552795 


694 

48 16 36 

26.3438797 

1440922 

645 

41 60 25 

25.3968502 

1550388 


695 

48 30 25 

26.3628527 

1438849 

646 

41 73 16 

25.4165301 

1547988 


696 

48 44 16 

26.3818119 

1436782 

647 

41 86 09 

25.4361947 

1545595 


697 

48 58 09 

26.4007576 

1434720 

648 

41 99 04 

25.4558441 

1543210 


698 

48 72 04 

26.4196896 

1432665 

649 

42 12 01 

25.4754784 

1540832 


699 

48 86 01 

26.4386081 

1430615 

650 

42 25 00 

25.4950976 

1538462 


700 

49 00 00 

26.4575131 

1428571 
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Squares, Square Roots, and Reciprocals to 1000 {Cont.) 


No. Square Square Root 

751 56 40 01 27.4043792 1331558 

752 56 55 04 27.4226184 1329787 

753 56 70 09 27.4408455 1328021 

754 56 85 16 27.4590604 1326260 

755 57 00 25 27.4772633 1324503 


No. 

Square 

Square Eool 

Recipro- 
cal X 101 


701 

49 14 01 

26 4764046 

1426534 


702 

49 28 04 

26.4952826 

1424501 


703 

49 42 09 

26.5141472 

1422475 


704 

49 56 16 

26.5329983 

1420455 


705 

49 70 25 

26.5518361 

1418440 


706 

49 84 36 

26.5706605 

1416431 


707 

49 98 49 

26.5894716 

1414427 


708 

50 12 64 

26.6082694 

1412429 


709 

50 26 81 

26.6270539 

1410437 


710 

50 41 00 

26.6458252 

1408451 


711 

50 55 21 

26.6645833 

1406470 


712 

50 69,44 

26.6833281 

1404494 


713 

50 83 69 

26.7020598 

1402525 

-VMw 

714 

50 97 96 

26.7207784 

1400560 


715 

51 12 25 

26.7394839 

1398601 1 

u 

... 

716 

51 26 56 

26.7581763 

1396648 i 

717 

51 40 89 

26.7768557 

13947C'C 

1392758 

)y(^ 

718 

51 55 24 

26.7955220 


719 

51 69 61 

26.8141754 

1390821^ 

>i'a 

720 

51 84 00 

26.8328157 

138888© 



57 15 36 27.4954542 1322751 i 
57 30 49 27.6136330 1321004 j 
57 45 64 27.5317998 1319261 
57 60 81 27.5499546 13176231 
57 76 00 27.6680976 13157891 


57 91 21 27.5862284 1314060 

58 06 44 27.6043475 1312336 
58 21 69 27.6224546 1310616 
58 36 96 27.6405499 1308901 
58 52 25 27.6586334 1307190 

58 67 56 27.6767050 1305483 
58 82 89 27.6947648 1303781 

58 98 24 27.7128129 1302083 

59 13 61 27.7308492 1300390 
59 29 00 27.7488739 1298701 

59 44 41 27.7668868 1297017 
59 59 84 27.7848880 1295337 
59 75 29 27.8028775 1293661 

59 90 76 27.8208555 1291990 

60 06 26 27.8388218 1290323 

60 21 76 27.8567766 1288660 
60 37 29 27.8747197 1287001 
60 62 84 27.8926514 1285347 
60 68 41 27.9105716 1283697 
60 84 00 27.9284801 1282051 

60 99 61 27.9463772 1280410 

61 15 24 27.9642629 1278772 
61 30 89 27.9821372 1277139 
61 46 56 28.0000000 1275510 

I 61 62 25 28.0178615 1273885 

61 77 96 28.0356915 1272265 

61 93 69 28.0635203 1270648 

62 09 44 28.0713377 1269036 
62 26 21 28.0891438 1267427 
62 41 00 28.1069386 1265823 

62 56 81 28.1247222 1264223 
62 72 64 28.1424946 1262626 

62 88 49 28.1602557 1261034 

63 04 36 28.1780056 1259446 
63 20„25 28.1957444 1257862 

63 36 16 28.2134720 1256281 
63 52 09 28.2311884 1254705 
63 68 04 28.2488938 1253133 

63 84 01 28 2665881 1251564 

64 00 00 28.2842712 1250000 
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Squares, Square Roots, and Reciprocals to 1000 {Cont.) 


No. 

Square 

Square Hoot 

Recipro- 
cal X IQS 

801 

64 16 01 

28.3019434 

1248439 

802 ! 

64 32 04 

28.3196045 

1246883 

803 

64 48 09 

28.3372546 

1245330 

804 

64 64 16 

28.3548938 

1243781 

805 

64 80 25 

28.3725219 

1242236 

806 

64 96 36 

28.3901391 

1240695 

807 

65 12 49 

28.4077454 

1239157 

808 

65 28 64 

28.4253408 

1237624 

809 

65 44 81 

28.4429253 

1236094 

810 

65 61 00 

28.4604989 

1234568 

811 

65 77 21 

28.4780617 

1233046 

812 

65 93 44 

28.4956137 

12315271 

813 

66 09 69 

28.5131549 

1230012 

814 

66 25 96 

28.5306852 

1228501 

815 

66 42 25 

, 28.5482048 

1226994 

816 

66 58 56 

28.5657137 

1225490 S 

817 

66 74 89 

28.5832119 

1223990 

818 

66 91 24 

28.6006993 

1222494 

819 

67 07 61 

28.6181760 

1221001 

820 

67 24 00 

28.6356421 

1219512 

821 

67 40 41 

28.6530976 

1218027 

822 S 

67 56 84 1 

28.6705424 

1216545 

823 

67 73 29 

28.6879766 

1215067 

i 824 

67 89 76 

28.7054002 ! 

1213592 

1 825 

68 06 25 

28.7228132 

1212121 

826 

68 22 76 

28.7402157 

1210654 

827 

68 39 29 : 

28.7576077 

1209190 

828 

68 55 84 1 

28.7749891 

1207729 

829 

68 72 41 ! 

28.7923601 

1206273 

830 

68 89 00 

28.8097206 

1204819 

831 

69 05 61 

28.8270706 

1203369 

832 

69 22 24 

28.8444102 

1201923 

833 

69 38 89 

28.8617394 

1200480 

834 

69 55 56 

28.8790582 

1199041 

835 

69 72 25 

1 

28.8963666 

1197605 

836 

69 88 96 : 

28.9136646 

1196172 

837 

70 05 69 

28.9309523 

1194743 

838 

70 22 44 

28.9482297 

1193317 i 

839 

70 39 21 

28.9654967 

1191895 

840 

70 56 00 

28.9827535 

1190476 : 

841 

70 72 81 

29.0000000 

1189061 

842 

70 89 64 

' 29.0172363 

1187648 

843 

71 06 49 

29.0344623 

1186240 

844 

71 23 36 

29.0516781 

1184834 

845 

71 40 25 

29.0688837 

1183432 

846 

71 57 16 

29.0860791 

1182033 

847 

I 71 74 09 

29.1032644 

1180638 

848 

71 91 04 

29.1204396 

1179245 

849 

72 08 01 

29.1376046 

1177856 

850 

72 25 00 

29.1547595 

1176471 


No. 

Square 

Square Root 

Recipro- 
cal X 10® 

851 

72 42 01 

29.1719043 

1175088 

852 

72 59 04 ; 

29.1890390 

1173709 

853 

72 76 09 

29.2061637 

1172333 

854 

72 93 16 

29.2232784 

1170960 

855 

73 10 25 

29.2403830 

1169591 

856 

73 27 36 

29.2574777 

1168224 

857 

73 44 49 

29.2745623 

116686x 

858 

73 61 64 

29.2916370 

1165501 

859 

73 78 81 

29.3087018 

1164144 

860 

73 96 00 

29.3257566 

1162791 

861 

74 13 21 

29.3428015 

1161440 

862 

74 30 44 

29.3598365 

1160093 

863 

74 47 69 

29.3768616 

1158749 

864 

74 64 96 

29.3938769 

1157407 

865 

74 82 25 

29.4108823 

1156069 

866 

74 99 56 

29.4278779 

1154734 

867 

75 16 89 

29.4448637 

1153403 

868 

75 34 24 

29.4618397 

1152074 

869 

75 51 61 

29.4788059 

1150748 

870 

75 69^00 

29.4957624 

1149425 

871 

75 86 41 

29.5127091 

1148106 

872 

76 03 84 

29.5296461 

1146789 

873 

76 21 29 

29.5465734 

1145475 

874 

76 38 76 

29.5634910 

1144165 

875 

76 56 25 

29.5803989 

1 1142857 

876 

76 73 76 

29.5972972 

1 1141553 

877 

76 91 29 

29.6141858 

1140251 

878 

77 08 84 

29.6310648 

1138952 

879 

77 26 41 

29.6479342 

1137656 

880 

77 44 00 

29.6647939 

1136364 

881 

77 61 61 

29.6816442 

1135074 1 

882 

77 79 24 

29.6984848 

1133787 1 

883 

77 96 89 

29.7153159 

1132503 1 

884 

78 14 56 

29.7321375 

1131222 1 

885 

78 32 25 

29.7489496 

1129944 

886 

78 49 96 

29.7657521 

1128668 

887 

78 67 69 

29.7825452 

1127396 

888 

78 85 44 

29,7993289 

1126126 

889 

79 03 21 

29.8161030 

1124859 

890 

79 21 00 

29.8328678 

: 1123596 

891 

79 38 81 

29,8496231 

1122334 

892 

79 56 64 

29.8663690 

1121076 

893 

79 74 49 

29.8831056 

1119821 

894 

79 92 36 

29.8998328 

1118568 

895 

80 10 25 

29.9165506 

1117318 

896 

80 28 16 

29.9332591 

1116071 

897 

80 46 09 

29.9499583 

1114827 

898 

80 64 04 

29.9666481 

1 1113586 

899 

80 82 01 

29.9833287 

[ 1112347 

900 

81 00 00 

30.0000000 

1111111 
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Squares, Square Roots, and Reciprocals to 1000 {ConU) 


Square Square Root c^f 


81 18 01 30.0166620 1109878 
81 36 04 30.0333148 1108647 
81 54 09 30.0499584 1107420 
81 72 16 30.0665928 1106195 

81 90 25 30.0832179 1104972 

82 08 36 30.0998339 1103753 
82 26 49 30.1164407 1102536 
82 44 64 30.1330383 1101322 
82 62 81 30.1496269 1100110 
82 81 00 30.1662063 1098901 

82 99 21 30.1827765 1097695 

83 17 44 30.1993377 1096491 
83 35 69 30.2158899 1095290 
83 53 96 30.2324329 1094092 
83 72 25 30.2489609 1092896 

83 90 56 30.2654919 1091703 

84 08 89 30.2820079 1090513 
84 27 24 30.2985148 1089325 
84 45 61 30.3150128 1088139 
84 64 00 30.3315018 1086957 

84 82 41 30.3479818 1085776 

85 00 84 30.3644529 1084599 
85 19 29 30.3809151 1083424 
85 37 76 30.3973683 1082251 
85 56 25 30.4138127 1081081 

85 74 76 30.4302481 1079914 

85 93 29 30.4466747 1078749 

86 11 84 30.4630924 1077586 
86 30 41 30.4795013 1076426 
86 49 00 30.4959014 1075269 

86 67 61 30.5122926 1074114 

86 86 24 30.5286750 1072961 

87 04 89 30.5450487 1071811 
87 23 56 30.5614136 1070664 1 
87 42 25 30.5777697 1069519 

87 60 96 30.5941171 1068376 
87 79 69 30.6104557 1067236 

87 98 44 30.6267857 1066098 

88 17 21 30.6431069 1064963 
88 36 00 30.6594194 1063830 

88 54 81 30.6757233 1062699 
88 73 64 30.6920185 1061571 

88 92 49 30.7083051 1060445 

89 11 36 30.7245830 1059322 
89 30 25 30.7408523 1058201 

89 49 16 30.7571130 1057082 
89 68 09 30.7733651 1055966 

89 87 04 30.7896086 1054852 

90 06 01 30.8058436 1053741 
90 25 00 30.8220700 1052632 


No. 

Square 

Square Root 

Eecipro- 
cal X IQS 

951 

90 44 01 

30.8382879 

1051525 

952 

90 63 04 

30.8544972 

1050420 

953 

90 82 09 

30.8706981 

1049318 

954 

91 01 16 

30.8868904 

1048218 

955 

91 20 25 

30.9030743 

1047120 

956 

91 39 36 

30.9192497 

1046025 

957 

91 58 49 

30.9354166 

1044932 

958 

91 77 64 

30.9515751 

1043841 

959 

91 96 81 

30.9677251 

1042753 

960 

92 16 00 

30.9838668 

1041667 

961 

92 35 21 

31.0000000 

1040583 

962 

92 54 44 

31.0161248 

1039501 

963 

92 73 69 

31.0322413 

1038422 

964 

92 92 96 

31.0483494 

1037344 

965 

93 12 25 

31.0644491 

1036269 

966 

93 31 56 

31.0805405 

1035197 

967 

93 50 89 

31.0966236 

1034126 

968 

93 70 24 

31.1126984 

1033058 

969 

93 89 61 

31.1287648 

1031992 

970 

94 09 00 

31.1448230 

1030928 

971 

94 28 41 

31.1608729 

1029866 

972 

94 47 84 

31.1769145 

1028807 

973 

94 67 29 

31.1929479 

1027749 

974 

94 86 76 

31.2089731 

1026694 

975 

95 06 25 

31.2249900 

1025641 

976 

95 25 76 

31.2409987 

1024590 

977 

95 45 29 

31.2569992 

1023541 

978 

95 64 84 

31.2729915 

1022495 

979 

95 84 41 

31.2889757 

1021450 

980 

96 04 00 

31.3049517 

1020408 

981 

96 23 61 

31.3209195 

1019368 

982 

96 43 24 

31.3368792 

1018330 

983 

96 62 89 

31.3528308 

1017294 

984 

96 82 56 

31.3687743 

1016260 

985 

97 02 25 

31.3847097 

1015228 

986 

97 21 96 

31.4006369 

1014199 

987 

97 41 69 

31.4165561 

1013171 

988 

97 6144 

31.4324673 

1012146 

989 

97 81 21 

31 4483704 

1011122 

990 

98 01 00 

31.4642654 

1010101 

991 

98 20 81 

31.4801525 

1009082 

992 

98 40 64 

31.4960315 

1008065 

993 

98 60 49 

31.5119025 

1007049 

994 

98 80 36 

31.5277655 

1006036 

995 

99 00 25 

31.5436206 

1005025 

996 

99 20 16 

31.5594677 

1004016 

997 

99 40 09 

31.5753068 

1003009 

998 

99 60 04 

31.5911380 

1002004 

999 

99 80 01 

31.6069613 

1001001 

1000 

1 00 00 00 

31.6227766 

1000000 
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APPENDIX TABLE V 

COMMON LOGARITHMS OF NUMBERS 

FROM 1 TO 10000 
TO FIVE DECIMAL PLACES 
1-100 


N 

Log 

N 

Log 

N 

Log 

N 

Log 

N 

Log 

0 

— 

20 

1.30 103 

40 

1.60 206 

60 

1-77 815 

80 

1.90 309 

1 

0.00 000 

21 

1.32 222 

41 

1.61 278 

61 

1-78 533 

81 

1.90 849 

2 

0.30 103 

22 

1.34 242 

42 

1.62 325 

62 

1.79 239 

82 

1.91 381 

3 

0.47 712 

23 

1.36 173 

43 

1-63 347 

63 

1-79 934 

83 

1.91 908 

4 

0.60 206 

24 

1.38 021 

44 

1-64 345 

64 

1.80 618 

84 

1.92 428 

5 

0.69 897 

25 

1-39 794 

45 

1.65 321 

65 

1. 81 291 

85 

1.92 942 

6 

0.77 815 

26 

1 . 41 497 

46 

1.66 276 

66 

1.81 954 

86 

1-93 450 

7 

0.84 510 

27 

143 136 

47 

1.67 210 

67 

1.82 607 

87 

1.93 952 

8 

0.90 309 

28 

144716 

48 

1.68 124 

68 

1.83 251 

88 

1.94448 

9 

0.95 424 

29 

1 .46 240 

49 

1.69 020 

69 

1-83 885 

89 

1-94 939 

10 

1. 00 000 

30 

1.47 712 

60 

1.69 897 

70 

1,84 510 

90 

1.95424 

11 

1.04 139 

31 

149 136 

51 

1.70 757 

71 

1.85 126 

91 

1.95 904 

12 

1.07 918 

32 

1-50515 

52 

1. 7 1 600 

72 

1.85 733 

92 

1.96379 

13 

I. II 394 

33 

1.51 851 

53 

1 

1.72 428 

73 

1.86 332 

93 

1.96 848 

14 

1.14613 

34 

1-53 148 

54 

1.73 239 

74 

1.86 923 

94 

1-97313 

15 

1. 17 609 

35 

1.54407 

55 

1.74 036 

75 

1.87 506 

95 

1.97 772 

16 

1.20 412 

36 

1-55 630 

56 

1.74819 

76 

1.88 081 

96 

1.98 227 

17 

1.23 045 

' 37 

1.56 820 

57 

1-75587 

77 

1.88 649 

97 

1.98 677 

18 

1.25 527 

i 38 

1-57 978 

58 

1-76 343 

78 

1.89 209 

98 

1.99 123 

19 

1.27 875 

1 39 

1.59 106 

59 

1.77085 

79 

1.89763 

99 

1.99 564 

20 

1.30 103 

40 

1.60 206 

60 

1-77815 

80 

1.90309 

100 

2.00 000 
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N 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

Prop. Parts 

100 

00 ooo 

043 

087 

130 

173 

217 

260 

303 

346 

389 





101 

00 432 

475 

518 

561 

604 

647 

689 

732 

775 

817 





102 

00 86o 

903 

945 

988 

*030 

*072 

*115 

‘"157 

*X 99 

*242 


44 

43 

42 

103 

01 284 

326 

368 

410 

452 

494 

536 

578 

620 

662 

I 

4.4 

4.3 

4.2 

104 

01 703 

745 

787 

828 

870 

912 

953 

995 

*036 

*078 

2 

8.8 

8 6 

84 

105 

02 1 19 

160 

202 

243 

284 

325 

366 

407 

449 

490 

4 

17 6 

17 2 

16.8 

106 

02 531 

572 

612 

653 

694 

735 

776 

816 

857 

898 

5 

22 0 

21.5 

21.0 

107 

02 938 

979 

*019 

*060 

*100 

*141 

*181 

*222 

*262 

*302 

6 

7 

26.4 

30.8 

25.8 

30.1 

25.2 

29 4 

108 

03 342 

383 

423 

463 

503 

543 

583 

623 

663 

703 

8 

35.2 

34.4 

33 6 

109 

03 743 

782 

822 

862 

902 

941 

981 

*021 

*060 

*100 

9 

39.6 

38.7 

37.8 

110 

04 X 39 

179 

218 

258 

297 

336 

376 

415 

454 

493 





111 

04 532 

571 

610 

650 

689 

727 

766 

805 

844 

883 





112 

04 922 

961 

999 

*038 

*077 

*115 

*154 

*192 

*231 

*269 


41 

40 

39 

113 

05308 

346 

385 

423 

461 

500 

538 

576 

614 

652 

I 

4.1 

8 2 

4.0 

3.9 

114 

05690 

729 

767 

805 

843 

881 

918 

956 

994 

*032 

3 

12.3 

la.o 

7.0 
II . 7 

115 

06 070 

108 

145 

183 

221 

258 

296 

333 

371 

408 

4 

16.4 

16.0 

IS 6 

116 

06 446 

483 

521 

558 

595 

633 

670 

707 

744 

781 

5 

6 

20.5 

24.6 

20.0 

24 0 

19.S 

23 4 

117 

06 819 

856 

893 

930 

967 

*004 

*041 

*078 

*115 

*x5i 

7 

28.7 

28.0 

27 3 

118 

07 188 

225 

262 

298 

335 

372 

408 

445 

482 

518 

8 

32.8 

32.0 

36.0 

31 2 

119 

07 555 

591 

628 

664 

700 

737 

773 

809 

846 

882 




35.1 

120 

07 918 

954 

990 

*027 

*063 

*099 

*X 35 

*171 

*207 

*243 





121 

08 279 

3x4 

350 

386 

422 

458 

493 

529 

565 

600 


38 

37 

36 

122 

08 636 

672 

707 

743 

778 

814 

849 

884 

920 

955 


123 

08 991 

*026 

*061 

*096 

*132 

*167 

*202 

*237 

*272 

*307 

1 

2 

3.8 

7.6 

3.7 

7.4 

3.6 

7.2 

124 

09 342 

377 

412 

447 

482 

5x7 

552 

587 

621 

656 

3 

I1.4 

II. I 

10.8 

125 

09 691 

726 

760 

795 

830 

864 

899 

934 

968 

*003 

4 

5 

IS 2 
19.0 

14.8 

18.S 

14.4 

18.0 

126 

10037 

072 

106 

140 

175 

209 

243 

278 

312 

346 

6 

22.8 

22.2 

21 6 

127 

10 380 

4x5 

449 

483 

517 

551 

585 

619 

653 

687 

7 

8 

26 6 
30.4 

25.9 
29 6 

25.2 

28 8 

128 

10 721 

755 

789 

823 

857 

890 

924 

958 

992 

*025 

9 

34.2 

33.3 

32.4 

129 

1 1 059 

093 

126 

160 

193 

227 

261 

294 

327 

36X 





130 

1 1 394 

428 

461 

494 

528 

561 

594 

628 

661 

694 





131 

11727 

760 

793 

826 

860 

893 

926 

959 

992 

*024 


1 36 

1 34 

33 

132 

12057 

090 

123 

156 

189 

222 

254 

287 

320 

352 

I 

3.5 

1 

3-4 

3-3 

133 

12385 

418 

450 

483 

516 

548 

581 

613 

646 

678 

2 

7.0 

68 

66 

' 134 

12 710 

743 

775 

808 

840 

872 

905 

937 

969 

*001 

3 

4 

lo.s 

14.0 

10.2 

13.6 

9 9 
13.2 

135 

13033 

066 

098 

130 

162 

194 

226 

258 

290 

322 

5 

17.S 

17.0 

16.S 

136 

13354 

386 

418 

450 

481 

513 

545 

577 

609 

640 

6 

7 

21.0 

2 A < 

20.4 

23.8 

19.8 

137 

13672 

704 

735 

767 

799 

830 

862 

893 

925 

956 

8 

28 0 

27.2 

26.4 

138 

13988 

*019 

*051 

*082 

*114 

*145 

*176 

*208 

*239 

*270 

9 

31.S 

30.6 

29.7 

139 

14 301 

333 

364 

395 

426 

457 

489 

520 

551 

582 





140 

14613 

644 

675 

706 

737 

768 

799 

829 

860 

89 X 





141 

14922 

953 

983 

*014 

*045 

*076 

*106 

*137 

*168 

*198 


[ 32 

! 

I 30 

142 

15 229 

259 

290 

320 

351 

381 

412 

442 

473 

503 

I 

' 3.2 

3.1 

3.0 

143 

15534 

564 

594 

625 

055 

685 

715 

746 

776 

806 

2 

3 

6.4 

9.6 

6.2 

9 3 

6.0 

9.0 

144 

15836 

866 

897 

927 

957 

987 

*017 

*047 

*077 

*107 

4 

12.8 

12.4 

12 0 

145 

16 137 

167 

197 

227 

256 

286 

316 

346 

376 

406 

5 

16.0 

IS 5 
18.6 

IS 0 

146 

16435 

465 

495 

524 

554 

584 

613 

643 

673 

702 

7 

22-4 

21 7 

X 5«0 

21 0 

147 

16 732 

761 

791 

820 

850 

879 

909 

938 

967 

997 

8 

25,6 

24 8 

24.0 

148 

17 026 

056 

085 

1x4 

X43 

173 

202 

231 

260 

289 

9 

2o.o 

27.9 

27.0 

149 

17 319 

348 

377 

406 

435 

464 

493 

522 

551 

580 





160 

17 609 

638 

667 

696 

725 

754 

782 

811 

840 

869 





N 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 


Prop 

. Parts 
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PLACE] V. 1500 — LOGARITHMS OF NUMBERS— 2009 


Prop. Parts 

N 

0 

1 

2 . 

3 

4 

6 

6 

7 

8 

9 





160 

17 609 

638 

667 

696 

725 

754 

Cl 

00 

811 

840 

869 





151 

17 898 

926 

955 

984 

*013 

*041 

*070 

*099 

*127 

*156 


29 

2.9 

28 

152 

18 184 

213 

241 

270 

298 

327 

355 

384 

412 

441 

I 

2.8 

153 

18 469 

498 

526 

554 

583 

611 

639 

667 

696 

724 

2 

S8 

5.6 

154 

18 752 

780 

808 

837 

865 

893 

921 

949 

977 

*005 

3 

8.7 

8.4 

155 

19033 

061 

089 

117 

145 

173 

201 

229 

257 

285 

5 

14 5 

14.0 

156 

19 312 

340 

368 

396 

424 

451 

479 

507 

535 

562 

6 

17 4 
20.3 
23 2 

16.8 

19.6 

22.4 

157 

19 590 

618 

645 

673 

700 

728 

756 

783 

811 

838 

7 

8 

158 

19 866 

893 

921 

948 

976 

*003 

*030 

*058 

*085 

*112 

9 

26.1 

25.2 

159 

20 140 

167 

194 

222 

249 

276 

303 

330 

358 

385 





160 

20 412 

439 

466 

493 

520 

548 

575 

602 

629 

656 





161 

20 683 

710 

737 

763 

790 

817 

844 

871 

898 

925 


27 

26 

162 

20 952 

978 

*005 

*032 

*059 

*085 

*112 

*139 

*165 

*192 

I 

2.7 

2,6 

163 

21 219 

245 

272 

299 

325 

352 

378 

405 

431 

458 

2 

54 

8.1 

5.2 

7,8 

164 

21 484 

511 

537 

564 

590 

617 

643 

669 

696 

722 

4 

10.8 

10.4 

165 

21 748 

775 

801 

827 

854 

880 

906 

932 

958 

985 

5 

5 

13.S 

16.2 

18.9 

13.0 

IS.6 

18.2 

166 

22 on 

037 

063 

089 

115 

141 

167 

194 

220 

246 

7 

167 

22 272 

298 

324 

350 

376 

401 

427 

453 

479 

505 

8 

21.6 

20.8 

168 

22 531 

557 

583 

608 

634 

660 

686 

712 

737 

763 

9 

24.3 

234 

169 

22 789 

814 

840 

866 

891 

917 

943 

968 

994 

*019 





170 

23 045 

070 

096 

121 

147 

172 

198 

223 

249 

274 





171 

23 300 

325 

350 

376 

401 

426 

452 

477 

502 

528 



25 

172 

23 553 

578 

603 

629 

654 

679 

704 

729 

754 

779 


1 

2 

2.5 

5.0 

173 

23 805 

830 

855 

880 

905 

930 

955 

980 

*005 

*030 


3 

7.5 

174 

24 055 

080 

105 

130 

155 

180 

204 

229 

254 

279 


4 

10 0 

175 

24 304 

329 

353 

378 

403 

428 

452 

477 

502 

527 


5 

6 

12.S 

15.0 

176 

24 551 

576 

601 

625 

650 

674 

699 

724 

748 

773 


7 

8 

9 

17.5 

177 

24 797 

822 

846 

871 

895 

920 

944 

969 

993 

*018 


22..«: 

178 

25 042 

066 

091 

115 

139 

164 

188 

212 

237 

261 





179 

25 285 

310 

334 

358 

382 

406 

431 

455 

479 

503 





180 

25 527 

551 

575 

600 

624 

648 

672 

696 

720 

744 


24 

23 

181 

25 768 

792 

816 

840 

864 

888 

912 

935 

959 

983 

I 

2 4 
4.8 


182 

26 007 

031 

055 

079 

102 

126 

150 

174 

198 

221 

2 

4.6 

183 

26 245 

269 

293 

316 

340 

364 

387 

411 

435 

458 

3 

A 

7.2 

6.9 

9.2 

11.5 

184 

26 482 

505 

529 

553 

576 

600 

623 

647 

670 

694 

5 

12.0 

185 

26 717 

741 

764 

788 

811 

834 

858 

881 

905 

928 

6 

14.4 

13.8 

186 

26 951 

975 

998 

*021 

*045 

*068 

*091 

*114 

*138 

*161 

7 

8 

16.8 

19-2 

16,1 

18.4 

187 

27 184 

207 

231 

254 

277 

300 

323 

346 

370 

393 

9 

21,6 

20,7 

188 

27 416 

439 

462 

485 

508 

531 

554 

577 

600 

623 





189 

27 646 

669 

692 

715 

738 

761 

784 

807 

830 

852 





190 

27 875 

898 

921 

944 

967 

989 

*012 

*035 

*058 

*081 ' 


22 

21 

191 

28 103 

126 

149 

171 

194 

217 

240 

262 

285 

307 

I 

2.2 

2.1 

192 

28330 

353 

375 

398 

421 

443 

466 

488 

511 

533 ' 

2 

44 

6 6 

4.2 

6.3 

8.4 

193 

28 556 

578 

601 

623 

646 

668 

691 

713 

735 

758 

3 

4 

8.8 

194 

28 780 

803 

825 

847 

870 

892 

914 

937 

959 

981 

5 

II . 0 

10.5 

195 

29 003 

026 

048 

070 

092 

115 

137 

159 

181 

203 

6 

7 

13 2 
15.4 

12.6 

14.7 

196 

29 226 

248 

270 

292 

314 

336 

358 

380 

403 

425 

8 

17-6 

16.8 

197 

29447 

469 

491 

513 

535 

557 

579 

601 

623 

645 

9 

19.8 

18.9 

198 

29 667 

688 

710 

732 

754 

776 

798 

820 

842 

863 





199 

29 885 

907 

929 

951 

973 

994 

*016 


*060 

*081 





200 

30 103 

125 

146 

168 

190 

211 

233 

255 

276 

298 

Prop. Parts 

N 

0 

1 

2 

3 

4 

6 

6 

7 

8 

9 


— 665 — 150,0 — LOGARITHMS OF NUMBERS — 2009 



V. 2000— LOGARITHMS OF NUMBERS— 2509 CFIve. 


m 


1 

2 

3 

4 

6 

6 

7 

8 

9 

Prop. Parts 

200 

30 103 

125 

146 

168 

190 

211 

233 

255 

276 

298 





201 

30 320 

341 

363 

384 

406 

428 

449 

471 

492 

514 





202 

30 535 

557 

578 

600 

621 

643 

664 

685 

707 

728 



29 

21 

' 203 

30 750 

771 

792 

814 

835 

856 

878 

899 

920 

942 

I 

2.2 

2.1 

, 204 

30 963 

984 

*006 

*^027 

*048 

*069 

*091 

*112 

*133 

*154 

2 

4.4 

4.2 

205 

31 175 

197 

218 

239 

260 

281 

302 

323 

345 

366 

3 

6.6 

8 S 

6.3 

2 A 

. 206 

31 387 

408 

429 

450 

471 

492 

513 

534 

555 

576 

s 

II.O 

10.5 

207 

31 597 

618 

639 

660 

681 

702 

723 

744 

765 

785 

6 

13.2 

12.6 

208 

31 806 

827 

848 

869 

890 

911 

931 

952 

973 

994 

8 

15*4 

17.6 

14.7 

16.8 

209 

32 015 

035 

056 

077 

098 

118 

139 

160 

181 

201 

9 

19.8 

18.9 

210 

32 222 

243 

263 

284 

305 

325 

346 

366 

387 

408 





211 

32 428 

449 

469 

490 

510 

531 

552 

572 

593 

613 





212 

32 634 

654 

675 

695 

715 

736 

756 

777 

797 

818 



20 

^ 213 

32 838 

858 

879 

899 

919 

940 

960 

980 

*001 

*021 


I 

2.0 

214 

33041 

062 

082 

102 

122 

143 

163 

183 

203 

224 


2 

3 

4.0 

6.0 

‘ 215 

33 244 

264 

284 

304 

325 

345 

365 

385 

405 

425 


4 

8.0 

216 

33 445 

465 

486 

506 

526 

546 

566 

586 

606 

626 


5 

lO.O 

; 217 

33 646 

666 

686 

706 

726 

746 

766 

786 

806 

826 


7 

14.0 

. 218 

33 846 

866 

885 

905 

925 

945 

965 

985 

*005 

*025 


8 

16.0 

219 

34 044 

064 

084 

104 

124 

143 

163 

183 

203 

223 


9 

x8.o 

220 

34 242 

262 

282 

301 

321 

341 

361 

380 

400 

420 





221 

34 439 

459 

479 

498 

518 

537 

557 

577 

596 

616 




222 

34635 

655 

674 

694 

713 

733 

753 

772 

792 

811 



19 

223 

34 830 

850 

869 

889 

908 

928 

947 

967 

986 

*005 


1 

2 

1.9 

3.8 

224 

35025 

044 

064 

083 

102 

122 

141 

160 

180 

199 


3 

S .7 

225 

35218 

238 

257 

276 

295 

315 

334 

353 

372 

392 


4 

7.6 

9.5 

226 

35411 

430 

449 

468 

488 

507 

526 

545 

564 

583 


6 

11.4 

227 

35603 

622 

641 

660 

679 

698 

717 

736 

755 

774 


7 

8 

13.3 

15.2 

228 

35793 

813 

832 

851 

870 

889 

908 

927 

946 

965 


9 

17.I 

229 

35 984 

*003 

*021 

*040 

*059 

*078 

*097 

*116 

*135 

*154 





230 

36 173 

192 

211 

229 

248 

267 

286 

305 

324 

342 





231 

36 361 

380 

399 

418 

436 

455 

474 

493 

5II 

530 



18 

232 

36 549 

568 

586 

605 

624 

642 

661 

680 

698 

717 


I 

1.8 

233 

36736 

754 

773 

791 

810 

829 

847 

866 

884 

903 


2 

3.6 

234 

36 922 

940 

959 

977 

996 

*014 

*033 

*051 

*070 

*088 


3 

4 

5-4 

7.2 

235 

37 107 

125 

144 

162 

181 

199 

218 

236 

254 

273 


S 

9.0 

236 

37 291 

310 

328 

346 

365 

383 

401 

420 

438 

457 


6 

7 

10.8 

12.6 

237 

37475 

493 

511 

530 

548 

566 

585 

603 

621 

639 


14.4 

238 

37 658 

676 

694 

712 

731 

749 

767 

785 

803 

822 


9 

16.2 

239 

37 840 

858 

876 

894 

912 

931 

949 

967 

985 

*003 





240 

38 021 

039 

057 

075 

093 

112 

130 

148 

166 

184 





241 

38 202 

220 

238 

256 

274 

292 

310 

328 

346 

364 



17 

242 

38 382 

399 

417 

435 

453 

471 

489 

507 

525 

543 


I 

1.7 

243 

38 561 

578 

596 

614 

632 

650 

668 

686 

703 

721 


3 

s.i 

244 

38 739 

757 

775 

792 

810 

828 

846 

863 

881 

899 


4 

6.8 

8 < 

245 

38917 

934 

952 

970 

987 

*005 

*023 

*041 

*058 

*076 


5 

10.2 

246 

39 094 

III 

129 

146 

164 

182 

199 

217 

235 

252 

7 

11.9 

247 

39 270 

287 

305 

322 

340 

358 

375 

393 

410 

428 

9 

13*6 

IS.3 

248 

39 445 

463 

480 

498 

515 

533 

550 

568 

585 

602 





249 

39 620 

637 

655 

672 

690 

707 

724 

742 

759 

777 





250 

39 794 

811 

829 

846 

863 

881 

898 

915 

933 

950 





N 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

Prop. Parts 


2000 — LOGARITHMS OF NUMBERS— 2509 —666 — 






PLACE] V. 2500 — LOGARITHMS OF NUMBERS — 3009 


Prop. Parts 

N 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 



250 

39 

794 

811 

829 

846 

863 

881 

898 

915 

933 

950 



251 

39 

967 

985 

*002 

*019 

*037 

*054 

*071 

*088 

*106 

*123 


18 

252 

40 

140 

157 

175 

192 

209 

226 

243 

261 

278 

295 

I 

1.8 

253 

40 

312 

329 

346 

364 

381 

398 

415 

432 

449 

466 

2 

3.6 

254 

40 

483 

500 

518 

535 

552 

569 

586 

o^ 

0 

04 

620 

637 

3 

5-4 

7.2 

255 

40 

654 

671 

688 

705 

722 

739 

756 

773 

790 

807 

S 

9.0 

256 

40 

824 

841 

858 

875 

892 

909 

926 

943 

960 

976 

6 

7 

10.8 

12.6 

257 

40 

993 

*010 

*027 

*044 

*061 

*078 

*095 

*iii 

*128 

*145 

8 

14.4 

258 

41 

162 

179 

196 

212 

229 

246 

263 

280 

296 

313 

9 

16.2 

259 

41 

330 

347 

363 

380 

397 

414 

430 

447 

464 

481 



260 

41 

497 

514 

531 

547 

564 

581 

597 

614 

631 

647 



261 

41 

664 

681 

697 

714 

731 

747 

764 

780 

797 

814 


17 

262 

41 

830 

847 

863 

880 

896 

913 

929 

946 

963 

979 

I 

1.7 

263 

41 

996 

*012 

*029 

*045 

*062 

*078 

*095 

*ili 

*127 

*144 

2 

3 

3.4 

S.l 

264 

'42 

160 

177 

193 

210 

226 

243 

259 

275 

292 

308 

4 

6.8 

265 

42 

325 

341 

357 

374 

390 

406 

423 

439 

455 

472 

5 

6 

8.S 

10.2 

266 

42 

488 

504 

521 

537 

553 

570 

586 

602 

619 

635 

7 

11.9 

267 

42 

651 

667 

684 

700 

716 

732 

749 

765 

781 

797 

8 

13.6 

268 J 

42 

813 

830 

846 

862 

878 

894 

911 

927 

943 

959 

9 


269 

42 

975 

991 

*008 

*024 

*040 

*056 

*072 

*088 

*104 

*120 



270 

43 

136 

152 

169 

185 

201 

217 

233 

249 

265 

281 


16 

271 

43 

297 

313 

329 

345 

361 

377 

393 

409 

425 

441 



272 

43 

457 

473 

489 

505 

521 

537 

553 

569 

584 

600 

2 

3.2 

273 

43 

616 

632 

648 

664 

680 

696 

712 

727 

743 

759 

3 

4.8 

274 

43 

775 

791 

807 

823 

838 

854 

870 

886 

902 

917 

4 

5 

0.4 

8.0 

275 

43 

933 

949 

965 

981 

996 

*012 

*028 

*044 

*059 

*075 

6 

9.6 

276 

44 

091 

107 

122 

138 

154 

170 

185 

201 

217 

232 

7 

8 

II . 2 

12.8 

277 

44 

248 

264 

279 

295 

311 

326 

342 

358 

373 

389 

9 

14.4 

278 

44 

404 

420 

436 

451 

467 

483 

498 

514 

529 

545 



279 

44 

560 

576 

592 

607 

623 

638 

654 

669 

685 

700 



280 

44 

716 

731 

747 

762 

778 

793 

809 

824 

840 

855 


15 

281 

44 

871 

886 

902 

917 

932 

948 

963 

979 

994 

*010 

I 

1 "*5 

282 

45 

025 

040 

056 

071 

086 

102 

117 

133 

148 

163 

2 

3 

3.0 

4-3 

283 

45 

179 

194 

209 

225 

240 

255 

271 

286 

301 

317 

4 

6.0 

284 

45 

332 

347 

362 

378 

393 

408 

423 

439 

454 

469 

S 

7-5 

285 

45 

484 

500 

515 

530 

545 

561 

576 

591 

606 

621 

7 

9.0 

10.5 

286 

45 

637 

652 

667 

682 

697 

712 

728 

743 

758 

773 

8 

12.0 

287 

45 

788 

803 

818 

834 

849 

864 

879 

894 

909 

924 

9 

13.5 

288 

45 

939 

954 

969 

984 

*000 

*015 

*030 

*045 

*060 

*075 



289 

46 

090 

105 

120 

135 

150 

165 

180 

195 

210 

225 


14 

290 

46 

240 

255 

270 

285 

300 

315 

330 

345 

359 

374 


291 

46 

389 

404 

419 

434 

449 

464 

479 

494 

509 

523 

2 

1.4 

2.8 

292 

46 

538 

553 

568 

583 

598 

613 

627 

642 

657 

672 

3 

4.2 

293 

46 

687 

702 

716 

731 

746 

761 

776 

790 

805 

820 

4 

5 

5.6 

7.0 

294 

46 

835 

850 

864 

879 

894 

909 

923 

938 

953 

967 

6 

8.4 

295 

46 

982 

997 

*012 

^026 

*041 

*056 

*070 

*085 

*100 

*114 

7 

8 

9.8 

II 2 

296 

47 

129 

144 

159 

173 

188 

202 

217 

232 

246 

261 

9 

12.6 

297 

47 

276 

290 

305 

319 

334 

349 

363 

378 

392 

407 



298 

47 

422 

436 

451 

465 

480 

494 

509 

524 

538 

553 



299 

47 

567 

582 

596 

61 1 

625 

640 

654 

669 

683 

698 



300 

47 

712 

727 

741 

756 

770 

784 

799 

813 

828 

842 

Prop 

. Parts 

N 

0 1 

1 

2 

3 

4 

5 

6 

7 

8 



— 667 — 2500 — LOGARITHMS OF NUMBERS — 3009 



V. 3000 — LOGARITHMS OF NUMBERS — 3509 CFIVe- 


N 

0 

1 

2 

3 

4 

5 

.6 

7 

8 

9 

Prop. Parts 

300 

47 712 

727 

741 

756 

770 

784 

799 

813 

828 

842 



301 

47857 

871 

885 

900 

914 

929 

943 

958 

972 

986 



302 

48 001 

015 

029 

044 

058 

073 

087 

lOI 

116 

130 



303 

00 

w 

159 

173 

187 

202 

216 

230 

244 

259 

273 



304 

48 287 

302 

316 

330 

344 

359 

373 

387 

401 

416 


15 

305 

48 430 

444 

458 

473 

487 

501 

515 

530 

544 

558 

I 

I.s 

306 

48 572 

586 

601 

615 

629 

643 

657 

671 

686 

700 

2 

3.0 

307 

48714 

728 

742 

756 

770 

785 

799 

813 

827 

841 

4 

6 0 

308 

48 855 

869 

883 

897 

911 

926 

940 

954 

968 

982 

5 

7 .S 

309 

48 996 

*010 

*024 

*038 

*052 

*066 

*080 

*094 

*108 

*122 

6 

9.0 

310 

49 136 

150 

164 

178 

192 

206 

220 

234 

248 

262 

8 

12.0 

311 

49276 

290 

304 

318 

332 

346 

360 

374 

388 

402 

9 

I3.S 

312 

49415 

429 

443 

457 

471 

485 

499 

513 

527 

541 



313 

49 554 

568 

582 

596 

610 

624 

638 

651 

665 

679 



314 

49 693 

707 

721 

734 

748 

762 

776 

790 

803 

^817 



315 

49 831 

845 

859 

872 

886 

900 

914 

927 

941 

955 



316 

49969 

982 

996 

*010 

*024 

*037 

*051 

*065 

*079 

*092 


14 

317 

50 106 

120 

133 

147 

161 

174 

188 

202 

215 

229 

1 

2 

1.4 

2.8 

318 

50 243 

256 

270 

284 

297 

311 

325 

338 

352 

'363 

3 

4.2 

319 

50379 

393 

406 

420 

433 

447 

461 

474 

488 

501 

4 

S.6 

320 

50 515 

529 

542 

556 

569 

583 

596 

610 

623 

637 

5 

6 

7.0 

8.4 

321 

50 651 

664 

678 

691 

705 

718 

732 

745 

759 

772 

7 

9.8 

322 

50 786 

799 

813 

826 

840 

853 

866 

880 

893 

907 

8 

II. 2 

323 

50 920 

934 

947 

961 

974 

987 

*001 

*014 

*028 

*041 



324 

51 055 

068 

081 

095 

108 

121 

135 

148 

162 

175 



325 

51 188 

202 

215 

228 

242 

255 

268 

282 

295 

308 



326 

51 322 

335 

348 

362 

375 

388 

402 

415 

428 

441 



327 

51455 

468 

481 

495 

508 

521 

534 

548 

561 

574 


13 

328 

51 587 

601 

614 

627 

640 

654 

667 

680 

693 

706 

I 

1.3 

329 

51 720 

733 

746 

759 

772 

786 

799 

812 

825 

838 

2 

2.6 

330 

51 851 

865 

878 

891 

904 

917 

930 

943 

957 

970 

3 

3.9 

331 

51 983 

996 

*009 

*022 

*035 

*048 

*061 

*075 

*088 

*101 

4 

5 

S.2 

6.S 

332 

52 1 14 

127 

140 

153 

166 

179 

192 

205 

218 

231 

6 

7.8 

333 

52 244 

257 

270 

284 

297 

310 

323 

336 

349 

362 

7 

8 

9.1 

334 

52 375 

388 

401 

414 

427 

440 

453 

466 

479 

492 

9 

10.4 

11.7 

335 

52 504 

517 

530 

543 

556 

569 

582 

595 

608 

621 



336 

52634 

647 

660 

673 

686 

699 

711 

724 

737 

750 



337 

52 763 

776 

789 

802 

815 

827 

840 

853 

866 

879 



338 

52 892 

905 

917 

930 

943 

956 

969 

982 

994 

*007 



339 

53 020 

033 

046 

058 

071 

084 

097 

no 

122 

135 


12 

340 

53 148 

161 

173 

186 

199 

212 

224 

237 

250 

263 

I 

1.2 

341 

53275 

288 

301 

314 

326 

339 

352 

364 

377 

390 

2 

2 4 

342 

53403 

415 

428 

441 

453 

466 

479 

491 

504 

517 

3 

4 

3-6 

4.8 

343 

53 529 

542 

555 

567 

580 

593 

605 

618 

631 

643 

5 

6.0 

344 

53 656 

668 

681 

694 

706 

719 

732 

744 

757 

769 

6 

y 

7.2 

8.4 

345 

53 782 

794 

807 

820 

832 

845 

857 

870 

882 

895 

8 

9.6 

346 

53 908 

920 

933 

945 

958' 

970 

983 

995 

*008 

*020 

9 

10.8 

347 

54 033 

045 

058 

070 

083 

095 

108 

120 

133 

145 



348 

54 158 

170 

183 

195 

208 

220 

233 

245 

258 

270 



349 

54 283 

295 

307 

320 

332 

345 

357 

370 

382 

394 



360 

54 407 

419 

432 

444 

456 

469 

481 

494 

506 

518 



LZ_ 

0 1 

1 

2 

3 

4 

6 

6 

7 

8 

9 

Prop 

. Parts 


3000 — LOGARITHMS OF NUMBERS— 3509 — 668 — 




PLACE] V. 3500 — LOGARITHMS OF NUMBERS — 4009 


1 Prop. Parts 

N 

1 ^ 

1 

2 

3 

1 ^ 

5 

6 

7 

8 

9 



360 

54407 

419 

432 

444 

456 

469 

481 

494 

506 

518 



351 

54 531 

543 

555 

^ 568 

580 

593 

605 

617 

630 

642 



352 

54 654 

667 

679 

691 

704 

716 

728 

741 

753 

765 



353 

54 777 

790 

802 

814 

827 

839 

851 

864 

876 

888 


13 

354 

54 900 

913 

925 

937 

949 

962 

974 

986 

998 

*011 

I 

1.3 

355 

55 023 

035 

047 

060 

072 

084 

096 

108 

121 

133 

2 

2 6 

356 

55 145 

157 

169 

182 

194 

206 

218 

230 

242 

255 

4 

S.2 

357 

55 267 

279 

291 

303 

315 

328 

340 

352 

364 

376 

S 

65 

358 

55 388 

400 

413 

425 

437 

449 

461 

473 

485 

497 

6 

7.8 

359 

55 509 

522 

534 

546 

558 

570 

582 

594 

606 

618 

8 

10,4 

360 

55 630 

642 

654 

666 

678 

691 

703 

715 

727 

739 

9 

11.7 

361 

55 751 

763 

775 

787 

799 

811 

823 

835 

847 

859 



362 

55 871 

883 

895 

907 

919 

931 

943 

955 

967 

979 



363 

55 991 

*003 

*015 

*027 

*038 

*050 

*062 

*074 

*086 

*098 



364 

'56 no 

122 

134 

146 

158 

170 

182 

194 

205 

217 



365 

56 229 

241 

253 

265 

277 

289 

301 

312 

324 

336 


12 

366 

56 348 

360 

372 

384 

396 

407 

419 

431 

443 

455 

1 

2 

1.2 

2.4 

367 

56467 

478 

490 

502 

514 

526 

538 

549 

561 

573 

3 

3.6 

368 , 

56 585 

597 

608 

620 

632 

644 

656 

667 

679 

691 

4 

4.8 

369 

56 703 

714 

726 

738 

750 

761 

773 

785 

797 

808 

5 

6 

6.0 

7.2 

0 

CO 

56 820 

832 

844 

855 

867 

879 

891 

902 

914 

926 

7 

8.4 

371 

56 937 

949 

961 

972 

984 

996 

*008 

*019 

*031 

*043 

1 

9 6 

372 

57 054 

066 

078 

089 

lOI 

113 

124 

136 

148 

159 



373 

57 171 

183 

194 

206 

217 

229 

241 

252 

264 

276 



374 

57 287 

299 

310 

322 

334 

345 

357 

368 

380 

392 



375 

57 403 

415 

426 

438 

449 

461 

473 

484 

496 

507 



376 

57519 

530 

542 

553 

565 

576 

588 

600 

611 

623 


11 

377 

57 634 

646 

657 

669 

680 

692 

703 

715 

726 

738 

I 

1. 1 

378 

57 749 

761 

772 

784 

795 

807 

818 

830 

841 

852 

2 

2.2 

379 

57 864 

875 

887 

898 

910 

921 

933 

944 

955 

967 

3 

3.3 

380 

57 978 

990 

*001 

*013 

*024 

*035 

*047 

*058 

*070 

*081 

4 

s 

4 4 

S-S 

381 

58 092 

104 

115 

127 

138 

149 

161 

172 

184 

195 

6 

6.6 

382 

58 206 

218 

229 

240 

252 

263 

274 

286 

297 

309 

7 

8 

7.7 

8 8 

383 

58 320 

331 

343 

354 

365 

377 

388 

399 

410 

422 

9 

9-9 

384 

58 433 

444 

456 

467 

478 

490 

501 

512 

524 

535 



385 

58 546 

557 

569 

580 

591 

602 

614 

625 

636 

647 



386 

58 659 

670 

681 

692 

704 

715 

726 

737 

749 

760 



387 

58 771 

782 

794 

805 

816 

827 

838 

850 

861 

872 



388 

58 883 

894 

906 

917 

928 

939 

950 

961 

973 

984 


10 

389 

58 995 

*006 

*017 

*028 

*040 

*051 

*062 

*073 

*084 

*095 

I 

I.O 

390 

59 106 

118 

129 

140 

151 

162 

173 

184 

195 

207 

2 

2.0 

391 

59 218 

229 

240 

251 

262 

273 

284 

295 

306 

318 

3 

4 

3.0 

4.0 

392 

59 329 

340 

351 

362 

373 

384 

395 

406 

417 

428 

5 

5-0 

393 

59 439 

450 

461 

472 

483 

494 

506 

517 

528 

539 

6 

<7 

6.0 

7.0 

394 

59 550 

561 

572 

583 

594 

605 

616 

627 

638 

649 

8 

8.0 

395 

59 660 

671 

682 

693 

704 

715 

726 

737 

748 

759 

9 

9 0 

396 

59 770 

780 

791 

802' 

813 

824 

835 

846 

857 

868 



397 

59 879 

890 

901 

912 

923 

934 

945 

956 

966 

977 



398 

59 988 

999 

*010 

*021 

*032 

*043 

*054 

*065 

*076 

*086 



399 

60 097 

108 

119 

130 

141 

152 

163 

173 

184 

195 



400 

60 206 

217 

228 

239 

249 

260 

271 

282 

293 

304 

Prop 

. Parts 

N 

0 

1 

2 

3 

4 

6 

6 

7 

8 

9 


— 669— 3500 — LOGARITHMS OF NUMBERS— 4009 




V. 4000— LOGARITHMS OF NUMBERS — 4509 [five- 



0 

1 

2 

3 

4 

5 

^ 6 

7 

8 

9 

Prop, Parts 

400 

6o 2 o6 

217 

228 

239 

249 

260 

271 

282 

2 Q 3 

304 



401 

60314 

325 

336 

347 

358 

369 

379 

390 

401 

412 



402 

60 423 

433 

444 

455 

466 

477 

487 

498 

509 

520 



403 

60 531 

541 

552 

563 

574 

584 

595 

606 

617 

627 



404 

60 638 

649 

660 

670 

681 

692 

703 

713 

724 

735 



405 

60 746 

756 

767 

778 

788 

799 

810 

821 

831 

842 



406 

60 853 

863 

874 

885 

895 

906 

917 

927 

938 

949 



407 

60959 

970 

981 

991 

*002 

*013 

*023 

*034 

*045 

*055 


11 

408 

61 066 

077 

087 

098 

109 

1 19 

130 

140 

I5I 

162 

I 

1. 1 

409 

61 172 

183 

194 

204 

215 

225 

236 

247 

257 

268 

2 

2.2 

410 

61 278 

289 

300 

310 

321 

331 

342 

352 

363 

374 

3 

4 

3*3 

4.4 

411 

61 384 

395 

405 

416 

426 

437 

448 

458 

469 

479 

5 

S-S 

412 

61 490 

500 

511 

521 

532 

542 

553 

563 

574 

584 

6 

6.6 

413 

61 595 

606 

616 

627 

637 

648 

658 

669 

679 

690 

8 

8.8 

414 

61 700 

711 

721 

731 

742 

752 

763 

773 

784' 

794 

9 

9.9 

415 

61 805 

815 

826 

836 

847 

857 

868 

878 

888 

899 



416 

61 909 

920 

930 

941 

951 

962 

972 

982 

993 

*003 



417 

62 014 

024 

034 

045 

055 

066 

076 

086 

097 

107 



418 

62 118 

128 

138 

149 

159 

170 

180 

190 

201 

. 211 



419 

62 221 

232 

242 

252 

263 

273 

284 

294 

304 

315 



420 

62 325 

335 

346 

356 

366 

377 

387 

397 

408 

418 



421 

62 428 

439 

449 

459 

469 

480 

490 

500 

511 

521 



422 

62 531 

542 

552 

562 

572 

583 

593 

603 

613 

624 


10 

423 

62 634 

644 

655 

665 

675 

685 

696 

706 

716 

726 

I 

I.O I 

424 

62 737 

747 

757 

767 

778 

788 

798 

808 

818 

829 

3 

3.0 

425 

62 839 

849 

859 

870 

880 

890 

900 

910 

921 

931 

4 

4.0 

426 

62 941 

951 

961 

972 

982 

992 

*002 

*012 

*022 

*033 

5 

6 

S 0 

6.0 

427 

63 043 

053 

063 

073 

083 

094 

104 

114 

124 

134 

7 

7.0 

428 

63 144 

155 

165 

175 

I8S 

195 

205 

215 

225 

236 

8 

8 0 

429 

63 246 

256 

266 

276 

286 

296 

306 

317 

327 

337 

9 

9,0 

430 

63 347 

357 

367 

377 

387 

397 

407 

417 

428 

438 



431 

63 448 

458 

468 

478 

488 

498 

508 

518 

528 

538 



432 

63 548 

558 

568 

579 

^ 589 

599 

609 

619 

629 

639 



433 

63 649 

659 

669 

679 

689 

699 

709 

719 

729 

739 



434 

63 749 

759 

769 

779 

789 

799 

809 

819 

829 

839 



435 

63 849 

859 

869 

879 

889 

899 

909 

919 

929 

939 



436 

63949 

959 

969 

979 

988 

998 

*008 

*018 

*028 

*038 


9 

437 

64 048 

058 

068 

078 

088 

098 

108 

118 

128 

137 

I 

0.9 

438 

64147 

157 

167 

177 

187 

197 

207 

217 

227 

237 

2 

1.8 

439 

64 246 

256 

266 

276 

286 

296 

306 

316 

326 

335 

3 

4 

2,7 

3 6 

440 

64 345 

355 

365 

375 

385 

395 

404 

414 

424 

434 

S 

4-5 

441 

64444 

454 

464 

473 

483 

493 

503 

513 

523 

532 

6 

5-4 

6.3 

442 

64 542 

552 

562 

572 

582 

591 

601 

611 

621 

631 

8 

7.2 

443 

64 640 

650 

660 

670 

680 

689 

699 

709 

719 

729 

9 

8.1 

444 

64738 

748 

758 

768 

777 

787 

797 

807 

816 

826 



445 

64 836 

846 

856 

865 

875 

885 

895 

904 

914 

924 



446 

64933 

943 

953 

963 

972 

982 

992 

*002 

*011 

*021 



447 

65031 

040 

050 

060 

070 

079 

089 

099 

108 

118 



448 

65 128 

137 

147 

157 

167 

176 

186 

196 

205 

215 



449 

65 225 

234 

244 

254 

263 

273 

283 

292 

302 

312 



450 

65 321 

331 

341 

350 

360 

369 

379 

389 

398 

408 



N 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

Prop, 

. Parts 


4000 — LOGARITHMS OF NUMBERS— 4509 —670 — 



PLACE] V. 4500 — Logarithms of numbers— 5009 


P Prop. Parts 

N 

0 

1 

2 

3 

4 

5 

6 

1 

8 

9 



460 

65 321 

331 

341 

350 

360 

369 

379 

389 

398 

408 



451 

65 418 

427 

437 

. 447 

456 

466 

475 

485 

495 

504 



452 

65 514 

523 

533 

543 

552 

562 

571 

581 

591 

600 



453 

65 610 

619 

629 

639 

648 

658 

667 

677 

686 

696 



454 

65 706 

715 

725 

734 

744 

753 

763 

772 

782 

792 



455 

65 801 

811 

820 

830 

839 

849 

858 

868 

877 

887 



456 

65 896 

906 

916 

925 

935 

944 

954 

963 

973 

982 


10 

457 

65 992 

*001 

*011 

*020 

*030 

*039 

*049 

*058 

*068 

*077 

X 

I.O 

458 

66 087 

096 

106 

1 15 

124 

134 

143 

153 

162 

172 

2 

2.0 

3.0 

4.0 

459 

66 181 

191 

200 

210 

219 

229 

238 

247 

257 

266 

3 

4 

460 

66 276 

285 

295 

304 

314 

323 

332 

342 

351 

361 

S 

5.0 

6.0 

7.0 

8.0 

461 

66 370 

380 

389 

398 

408 

417 

427 

436 

445 

455 

6 

462 

66 464 

474 

483 

492 

502 

511 

521 

530 

539 

549 

8 

463 

66 558 

567 

577 

586 

596 

605 

614 

624 

633 

642 

9 

9.0 

464 

,66 652 

661 

671 

680 

689 

699 

708 

717 

727 

736 



465 

66 745 

755 

764 

773 

783 

792 

801 

811 

820 

829 



466 

66 839 

848 

857 

867 

876 

885 

894 

904 

913 

922 



467 

66 932 

941 

950 

960 

969 

978 

987 

997 

*006 

*015 



468 

67025 

034 

043 

052 

062 

071 

080 

089 

099 

108 



469 

67 1 17 

127 

136 

145 

154 

164 

173 

182 

191 

201 



470 

67 210 

219 

228 

237 

247 

256 

265 

274 

284 

293 



471 

67 302 

311 

321 

330 

339 

348 

357 

367 

376 

385 


9 

472 

67394 

403 

413 

422 

431 

440 

449 

459 

468 

477 

I 

0.9 

I 8 

473 

67 486 

495 

504 

514 

523 

532 

541 

550 

560 

569 

3 

2.7 

474 

67 578 

587 

596 

605 

614 

624 

633 

642 

651 

660 

4 

3.6 

475 

67 669 

679 

688 

697 

706 

715 

724 

733 

742 

752 

5 

6 

4.5 

S .4 

476 

67 761 

770 

779 

788 

797 

806 

815 

825 

834 

843 

7 

8 

9 , 

6.3 

477 

67 852 

861 

870 

879 

888 

897 

906 

916 

925 

934 

7.2 

8.1 

478 

67 943 

952 

961 

970 

979 

988 

997 

*006 

*015 

*024 


479 

68 034 

043 

052 

061 

070 

079 

088 

097 

106 

115 



480 

68 124 

133 

142 

151 

160 

169 

178 

187 

196 

205 



481 

68 215 

224 

233 

242 

251 

260 

269 

278 

287 

296 



482 

68 305 

314 

323 

332 

341 

350 

359 

368 

377 

386 



483 

68 395 

404 

413 

422 

431 

440 

449 

458 

467 

476 



484 

68 485 

494 

502 

511 

520 

529 

538 

547 

556 

565 



485 

68 574 

583 

592 

601 

610 

619 

628 

637 

646 

655 


8 

486 

68 664 

673 

681 

690 

699 

708 

717 

726 

735 

744 

I 

0.8 

487 

68 753 

762 

771 

780 

789 

797 

806 

815 

824 

833 

2 

•2 

1.6 

2.4 

3.2 

488 

68 842 

851 

860 

869 

878 

886 

895 

904 

913 

922 

o 

4 

489 

68 931 

940 

949 

958 

966 

975 

984 

993 

*002 

*011 

5 

4.0 

4.8 

5.6 

490 

69 020 

028 

037 

046 

055 

064 

073 

082 

090 

099 

6 

7 

491 

69 108 

117 

126 

135 

144 

152 

161 

170 

179 

188 

8 

6.4 

492 

69 197 

205 

214 

223 

232 

241 

249 

258 

267 

276 

9 

7.2 

493 

69 285 

294 

302 

311 

320 

329 

338 

346 

355 

364 



494 

69 373 

381 

390 

399 

408 

417 

425 

434 

443 

452 



495 

69 461 

469 

«478 

487 

496 

504 

513 

522 

531 

539 



496 

69 548 

557 

566 

574 

583 

592 

601 

609 

618 

627 



497 

69 636 

644 

653 

662 

671 

679 

688 

697 

705 

714 



498 

69 723 

732 

740 

749 

758 

767 

775 

784 

793 

801 



499 

69 810 

819 

827 

836 

845 

854 

862 

871 

880 

888 



600 

69 897 

906 

914 

923 

932 

940 

949 

958 

966 

975 

Prop. Parts 

N 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 


671— 4500 — LOGARITHMS OF NUMBERS — 5009 




V. 5000 — LOGARITHMS OF NUMBERS — 5509 EFIVE- 



5000— LOGARITHMS OF NUMBERS — 5509 —672 — 




PLACE] V. 5500— LOGARITHMS OF NUMBERS — 6009 


Prop 

. Parts 

N 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 



660 

74 036 

044 

/)52 

060 

068 

076 

084 

092 

099 

107 



551 

74 1 15 

123 

131 

139 

147 

X 55 

162 

170 

178 

186 



552 

74 194 

202 

2icr 

218 

225 

233 

241 

249 

257 

265 



553 

74 273 

280 

288 

296 

304 

312 

320 

327 

335 

343 



554 

74 351 

359 

367 

374 

382 

390 

398 

406 

414 

421 



555 

74 429 

437 

445 

453 

461 

468 

476 

484 

492 

500 



556 

74 507 

515 

523 

531 

539 

547 

554 

562 

570 

578 



557 

74 586 

593 

601 

609 

617 

624 

632 

640 

*648 

656 



558 

74 663 

671 

679 

687 

695 

702 

710 

718 

726 

733 



559 

74741 

749 

757 

764 

772 

780 

788 

796 

803 

811 



660 

74 819 

827 

834 

842 

850 

858 

865 

873 

88 1 

889 



561 

74896 

904 

912 

920 

927 

935 

943 

950 

958 

966 


8 

562 

74 974 

981 

989 

997 

*005 

*012 

*020 

*028 

*035 

*043 

I 

0.8 

563 

75 051 

059 

066 

074 

082 

089 

097 

X05 

1x3 

120 

2 

3 

1.6 

2.4 

564 

75 128 

136 

143 

151 

159 

166 

X 74 

182 

189 

197 

4 

3.2 

565 * 

75 205 

213 

220 

228 

236 

243 

251 

259 

266 

274 

5 

5 

4.0 

566 

75 282 

289 

297 

305 

312 

320 

328 

335 

343 

351 

7 

S-6 

567 

75 358 

366 

374 

381 

389 

397 

404 

412 

420 

427 

8 

6.4 

568 

75 435 

442 

450 

458 

465 

473 

481 

1 488 

496 

504 

9 

7.2 

569 

75511 

519 

526 

534 

542 

549 

557 

565 

572 

580 



670 

75 587 

595 

603 

610 

618 

626 

633 

641 

648 

656 



571 

75 664 

671 

679 

686 

694 

702 

709 

717 

724 

732 



572 

75 740 

747 

755 

762 

770 

778 

785 

793 

800 

808 



573 

75815 

823 

831 

838 

846 

853 

861 

868 

876 

884 



574 

75 891 

899 

906 

914 

921 

929 

937 

944 

952 

959 



575 

75 967 

974 

982 

989 

997 

*005 

*012 

*020 

*027 

*035 



576 

76 042 

050 

057 

065 

072 

080 

087 

095 

103 

no 



577 

76 118 

125 

133 

140 

148 

X 55 

163 

170 

178 

185 



578 

76 193 

200 

208 

215 

223 

230 

238 

245 

253 

260 



579 

76 268 

275 

283 

290 

298 

305 

313 

320 

328 

335 



680 

76 343 

350 

358 

365 

373 

380 

388 

395 

403 

410 


7 

581 

76 418 

425 

433 

440 

448 

455 

462 

470 

477 

485 


0 *7 

582 

76492 

500 

507 

515 

522 

530 

537 

545 

552 

559 

2 

U. / 

1.4 

583 

76 567 

574 

582 

589 

597 

604 

612 

619 

626 

634 

3 

A 

2.1 

2.8 

584 

76 641 

649 

656 

664 

671 

678 

686 

693 

701 

708 

s 

3-5 

585 

76 716 

723 

730 

738 

745 

753 

760 

768 

775 

782 

6 

4.2 

586 

76 790 

797 

805 

812 

819 

827 

834 

842 

849 

856 

7 

8 

4.9 

5.6 

587 

76 864 

871 

879 

886 

893 

901 

908 

916 

923 

930 

9 

63 

588 

76 938 

945 

953 

960 

967 

975 

982 

989 

997 

*004 



589 

77 012 

019 

026 

034 

041 

048 

056 

063 

070 

078 



690 

77 085 

093 

100 

107 

1x5 

122 

129 

137 

144 

151 



591 

77 159 

166 

173 

181 

188 

195 

203 

210 

217 

225 



592 

77 232 

240 

247 

254 

262 

269 

276 

283 

291 

298 



593 

77 305 

313 

320 

327 

335 

342 

349 

357 

364 

371 



594 

77 379 

386 

393 

401 

408 

415 

422 

430 

437 

444 



595 

77 452 

459 

466 

474 

481 

488 

495 

503 

510 

517 



596 

77 525 

532 

539 

546 

554 

561 

568 

576 

583 

590 



597 

77 597 

605 

612 

619 

627 

634 

641 

648 

656 

663 



598 

77 670 

677 

685 

692 

699 

706 

714 

721 

728 

735 



599 

77 743 

750 

757 

764 

772 

779 

786 

793 

801 

808 



600 

77 815 

822 

830 

837 

844 

851 

859 

866 

873 

880 

Prop. 

. Parts 

N 

0 

1 

2 

3 

4 

6 

6 

7 

8 

9 


— 673 — 5500 — LOGARITHMS OF NUMBERS — 6009 




V. 6000 — LOGARITHMS OF NUMBERS — 6509 CF 


N 

0 

1 

2 

3 

4 

6 

6 

7 

8 

9 

Prop. Parts 

600 

77 8iS 

822 

830 

837 

844 

851 

859 

866 

873 

880 



601 

77 887 

895 

902 

909 

916 

924 

931 

938 

945 

952 



602 

77 960 

967 

974 

981 

988 

996 

*003 

*010 

*017 

*025 



603 

78 032 

039 

046 

053 

061 

068 

075 

082 

089 

097 



604 

78 104 

III 

118 

125 

132 

140 

147 

154 

161 

168 



605 

78 176 

183 

190 

197 

204 

211 

219 

226 

233 

240 



606 

78 247 

254 

262 

269 

276 

283 

290 

297 

305 

312 



607 

78319 

326 

333 

340 

347 

355 

362 

369 

376 

383 


8 

608 

78 390 

398 

405 

412 

419 

426 

433 

440 

447 

455 

I 

0.8 

1.6 

609 

78 462 

469 

476 

483 

490 

497 

504 

512 

519 

526 


610 

00 

C/I 

540 

547 

554 

561 

569 

576 

583 

590 

597 

4 

3.2 

611 

78 604 

6II 

618 

625 

633 

640 

647 

654 

661 

668 

S 

5 

4.0 

4.8 

5.6 

612 

78675 

682 

689 

696 

704 

711 

718 

725 

732 

739 

7 

613 

78 746 

753 

760 

767 

774 

781 

789 

796 

803 

810 

8 

6.4 

614 

78817 

824 

831 

838 

845 

852 

859 

866 

873 

880 

9 

7.2 

615 

78 888 

895 

902 

909 

916 

923 

930 

937 

944 

■951 



616 

78 958 

965 

972 

979 

986 

993 

*000 

*007 

*014 

*021 



617 

79 029 

036 

043 

050 

057 

064 

071 

078 

085 

092 



618 

79 099 

106 

I13 

120 

127 

134 

141 

148 

155 

162 



619 

79 169 

176 

183 

190 

197 

204 

211 

218 

225 

232 



620 

79 239 

246 

253 

260 

267 

274 

281 

288 

295 

302 



621 

79 309 

316 

323 

330 

337 

344 

351 

358 

365 

372 


7 

622 

79 379 

386 

393 

400 

407 

414 

421 

428 

435 

442 


0.7 

1.4 

623 

79 449 

456 

463 

470 

477 

484 

491 

498 

505 

511 

2 

624 

79518 

525 

532 

539 

546 

553 

560 

567 

574 

581 

3 

2.1 

2.8 

3-5 

625 

79588 

595 

602 

609 

616 

623 

630 

637 

644 

650 

4 

5 

626 

79 657 

664 

671 

678 

685 

692 

699 

706 

713 

720 

6 

4.2 

627 

79 727 

734 

741 

748 

754 

761 

768 

775 

782 

789 

7 

8 

4.9 

S .6 

628 

79 796 

803 

810 

817 

824 

831 

837 

844 

851 

858 

9 

6.3 

629 

79 865 

872 

879 

886 

893 

900 

906 

913 

920 

927 



630 

79 934 

941 

948 

955 

962 

969 

975 

982 

989 

996 



631 

80 003 

010 

017 

024 

030 

037 

044 

051 

058 

065 



632 

80 072 

079 

085 

092 

099 

106 

113 

120 

127 

134 



633 

80 140 

147 

154 

161 

168 

175 

182 

188 

195 

202 



634 

80 209 

216 

223 

229 

236 

243 

250 

257 

264 

271 



635 

80 277 

284 

291 

298 

305 

312 

318 

325 

332 

339 



636 

80 346 

353 

359 

366 

373 

380 

387 

393 

400 

407 


6 

637 

80 414 

421 

428 

434 

441 

448 

455 

462 

468 

475 

I 

0 6 

638 

80 482 

489 

496 

502 

509 

516 

523 

530 

536 

543 

2 

3 

1.2 

1.8 

639 

80 550 

557 

564 

570 

577 

584 

591 

598 

604 

611 

4 

2.4 

640 

80 618 

625 

632 

638 

645 

652 

659 

665 

672 

679 

5 

3.0 

641 

80 686 

693 

699 

706 

713 

720 

726 

733 

740 

747 

6 

7 

3 6 

4.2 

642 

80 754 

760 

767 

774 

781 

787 

794 

801 

808 

814 

8 

48 

643 

80 821 

828 

835 

841 

848 

855 

862 

868 

875 

882 

9 

5.4 

644 

80 889 

895 

902 

909 

916 

922 

929 

936 

943 

949 



645 

80 956 

963 

969 

976 

983 

990 

996 

*003 

*^010 

*017 



646 

81 023 

030 

037 

043 

050 

057 

064 

070 

077 

084 



647 

81 090 

097 

104 

III 

117 

124 

131 

137 

144 

151 



648 

81 158 

164 

171 

178 

184 

191 

198 

204 

211 

218 



649 

81 224 

231 

238 

245 

251 

258 

265 

271 

278 

285 



660 

81 291 

298 

305 

311 

318 

325 

331 

338 

345 

351 



N 

0 

1 

2 

3 

4 

6 

6 

7 

8 

9 

Prop. Parts 


6000 — LOGARITHMS OF NUMBERS — 6509 —674 — 



PIACE] V. 6500 — LOGARITHMS OF NUMBERS — 7009 


Prop. Parts 

N 

0 

1 

2 

3 

4 

5 

6 

7 

8 




650 

8i 291 

298 

305' 

311 

318 

325 

331 

338 

345 

351 



651 

81 358 

365 

371 

378 

385 

391 

398 

405 

41 1 

418 



652 

81 425 

431 

438 

445 

451 

458 

465 

471 

478 

485 



653 

81 491 

498 

505 

511 

518 

525 

531 

538 

544 

551 



654 

81 558 

564 

571 

578 

584 

591 

598 

604 

611 

617 



655 

81 624 

631 

637 

644 

651 

657 

664 

671 

677 

684 



656 

81 690 

697 

704 

710 

717 

723 

730 

737 

743 

750 



657 

81 757 

763 

770 

776 

783 

790 

796 

803 

809 

816 



658 

81 823 

829 

836 

842 

849 

856 

862 

869 

875 

882 



659- 

81 889 

895 

902 

908 

915 

921 

928 

935 

941 




660 

81 954 

961 

968 

974 

981 

987 

994 

*000 

*007 

*014 



661 

82 020 

027 

033 

040 

046 

053 

060 

066 

073 

079 


7 

662 

82 086 

092 

099 

105 

112 

119 

-25 

132 

138 

145 

I 

0.7 

663 

82 151 

158 

164 

171 

178 

184 

191 

197 

204 

210 

2 

3 

1.4 

2.1 

664 

82 217 

223 

230 

236 

243 

249 

256 

263 

269 

276 

4 

2.8 

665 

82 282 

289 

295 

302 

308 

315 

321 

328 

334 

341 

5 

3-5 

666 

82347 

354 

360 

367 

373 

380 

387 

393 

400 

406 

7 

4*2 

4-9 

667 

82413 

419 

426 

432 

439 

445 

452 

458 

465 

471 

8 

5.6 

668 

82 478 

484 

491 

497 

504 

510 

517 

523 

530 

536 

9 

6-3 

669 

8i2 543 

549 

556 

562 

569 

575 

582 

588 

595 

601 



670 

82 607 

614 

620 

627 

633 

640 

646 

653 

659 

666 



671 

82 672 

679 

685 

692 

698 

705 

711 

718 

724 

730 



672 

82 737 

743 

750 

756 

763 

769 

776 

782 

789 

795 



673 

82 802 

808 

814 

821 

827 

834 

840 

847 

853 

860 



674 

82 866 

872 

879 

885 

892 

898 

905 

911 

918 

924 



675 

82 930 

937 

943 

950 

956 

963 

969 

975 

982 

988 



676 

82 995 



*014 

*020 

*027 

*033 

*040 

*046 

*052 



677 

83 059 

065 

072 

078 

085 


097 

104 

no 

1x7 



678 

83 123 

129 

136 

142 

149 

155 

161 

168 

174 

181 



679 

83 187 

193 

200 

206 

213 

219 

225 

232 

238 

245 



680 


257 

264 

270 

276 

283 

289 

296 

302 

308 


c 

681 


321 

327 

334 

340 

347 

353 

359 

366 

372 


D 

682 

83 378 

385 

391 

398 

404 

410 

417 

423 

429 

436 

1 

2 

0.6 

1.2 

683 

83442 

448 

455 

461 

467 

474 

480 

487 

493 

499 

3 

1.8 

684 

83 506 

512 

518 

525 

531 

537 

544 

550 

556 

563 

4 

e 

2.4 

3.0 

3.6 

685 

83 569 

575 

582 

588 

594 

601 

607 

613 

620 

626 

o 

6 

686 

83 632 

639 

645 

651 

658 

664 

670 

677 

683 

689 

7 

8 

4.2 

4.8 

687 

83 696 

702 

708 

715 

721 

727 

734 

740 

746 

753 

9 

S.4 

688 

83 759 

765 

771 

778 

784 

790 

797 

803 

809 

816 



689 

83 822 

828 

835 

841 

847 

853 

860 

866 

872 

879 



690 


891 

897 

904 

910 

916 

923 

929 

935 

942 



691 


954 


967 

973 

979 

985 

992 

998 

1^11 



692 

84 on 

017 

023 

029 

036 

042 

048 

055 

061 

067 



693 

84073 

080 

086 

092 

098 

105 

III 

117 

123 

130 



694 

84 136 

142 

148 

155 

161 

167 

173 

180 

186 

192 



695 

84 198 

205 

2 II 

217 

223 

230 

236 

242 

248 

255 



696 

84 261 

267 

273 

280 

286 

292 

298 

305 

311 

317 



697 

84 323 

330 

336 

342 

348 

354 

361 

367 

373 

379 



698 

84 386 

392 

398 

404 

410 

417 

423 

429 

435 

442 



699 

84 448 

454 

460 

466 

473 

479 

485 

491 

497 

504 



o 

o 

84 510 

516 

522 

528 

535 

541 

547 

553 

559 

566 

Prop. Parts 

N 


1 

2 

3 

4 

5 

6 

7 

8 

9 


— 675— 6500 — LOGARITHMS OF NUMBERS — 7009 




V. 7000 — LOGARITHMS OF NUMBERS — 7509 [five- 


N 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

Prop. Parts 

700 

84510 

516 

522 

528 

535 

54I 

547 

553 

559 

566 



701 

84 572 

578 

584 

590 

597 

,603 

609 

615 

621 

628 



702 

84634 

640 

646 

652 

658 

665 

671 

677 

683 

689 



703 

84 696 

702 

708 

714 

720 

726 

733 

739 

745 

751 



704 

84 757 

763 

770 

776 

782 

788 

794 

800 

807 

813 



705 

84 819 

825 

831 

837 

844 

850 

856 

862 

868 

874 



706 

84 880 

887 

893 

899 

905 

91 1 

917 

924 

930 

936 



707 

84 942 

948 

954 

960 

967 

973 

979 

985 

991 

997 


7 

708 

85 003 

009 

016 

022 

028 

034 

040 

046 

052 

058 

I 

0.7 

709 

85065 

071 

077 

083 

089 

095 

lOI 

107 

1 14 

120 

2 

14 

710 

85 126 

132 

138 

144 

150 

156 

163 

169 

175 

181 

3 

4 

2.8 

711 

85187 

193 

199 

205 

211 

217 

224 

230 

236 

242 

S 

3 .S 

712 

85248 

254 

260 

266 

272 

278 

285 

291 

297 

303 

6 

7 

4.2 

4.9 

713 

85 309 

315 

321 

327 

333 

339 

345 

352 

358 

364 

8 

5.6 

714 

85 370 

376 

382 

388 

394 

400 

406 

412 

418 

425 

9 

6.3 

715 

85431 

437 

443 

449 

455 

461 

467 

473 

479 

485 



716 

85491 

497 

503 

509 

516 

522 

528 

534 

540 

546 



717 

85 552 

558 

564 

570 

576 

582 

588 

594 

600 

606 



718 

85 612 

618 

625 

631 

637 

643 

649 

655 

661 

667 



719 

85 673 

679 

685 

691 

697 

703 

709 

715 

72 ‘I 

727 



720 

85 733 

739 

745 

751 

757 

763 

769 

775 

781 

788 



721 

85 794 

800 

806 

812 

818 

824 

830 

836 

842 

848 



722 

85 854 

860 

866 

872 

878 

884 

890 

896 

902 

908 


6 

723 

85914 

920 

926 

932 

938 

944 

950 

956 

962 

968 

1 

2 

0 6 

724 

85 974 

980 

986 

992 

998 

*004 

*010 

*016 

*022 

*028 

3 

1.8 

725 

86 034 

040 

046 

052 

058 

064 

070 

076 

082 

088 

4 

2 4 

726 

86 094 

100 

106 

112 

118 

124 

130 

136 

141 

147 

5 

6 

3 0 

3.6 

727 

86 153 

159 

165 

171 

177 

183 

189 

195 

201 

207 

7 1 

R 1 

4.2 

A R 

728 

86213 

219 

225 

231 

237 

243 

249 

255 

261 

267 

0 

9 

4 0 

5*4 

729 

86 273 

279 

285 

291 

297 

303 

308 

314 

320 

326 



730 

86 332 

338 

344 

350 

356 

362 

368 

374 

380 

386 



7 ii 

86 392 

398 

404 

410 

415 

421 

427 

433 

439 

445 



732 

86 451 

457 

463 

469 

475 

481 

487 

493 

499 

504 



733 

86 510 

516 

522 

528 

534 

540 

546 

552 

558 

564 



734 

86 570 

576 

581 

587 

593 

599 

605 

611 

617 

623 



735 

86 629 

635 

641 

646 

652 

658 

664 

670 

676 

682 



736 

86 688 

694 

700 

705 

711 

717 

723 

729 

735 

741 


5 

737 

86 747 

753 

759 

764 

770 

776 

782 

788 

794 

800 

I 

0.5 

738 

86 806 

812 

817 

823 

829 

835 

841 

847 

853 

859 

2 

I 0 

739 

86 864 

870 

876 

882 

888 

894 

900 

906 

911 

917 

3 

4 

1 5 

2 0 

740 

86 923 

929 

935 

941 

947 

953 

958 

964 

970 

976 

5 

2 5 

741 

86 982 

988 

994 

999 

*005 

*011 

*017 

*023 

*029 

*035 

6 

3.0 

•2 j; 

742 

87 040 

046 

052 

058 

064 

070 

075 

081 

087 

093 

/ 

8 

4 0 

743 

87099 

105 

III 

116 

122 

128 

134 

140 

146 

151 

9 

4 S 

744 

87 157 

163 

169 

175 

181 

186 

192 

198 

204 

210 



745 

87 216 

221 

227 

233 

239 

245 

251 

256 

262 

268 



746 

87 274 

280 

286 

291 

297 

303 

309 

315 

320 

326 



747 

87 332 

338 

344 

349 

355 

361 

367 

373 

379 

384 



748 

87 390 

396 

402 

408 

413 

419 

425 

431 

437 

442 



749 

87448 

454 

460 

466 

471 

477 

483 

489 

495 

500 



760 

87 506 

512 

518 

523 

529 

535 

541 

547 

552 

558 



N 

0 

1 

2 

3- 

4 

5 

6 

7 

8 

9 

Prop, 

. Parts 


7000 — LOGARITHMS OF NUMBERS— 7509 —676 — 




PLACE] V. 7500 — LOGARITHMS OF NUMBERS — 8009 


A. 


Prop. Parts 

N 

0 

1 

2 

3 

4 

6 

6 

7 

8 

9 



760 

87 506 

512 

518 

523 

529 

535 

541 

547 

552 

558 



751 

87 564 

570 

576 

581 

587 

593 

599 

604 

610 

616 



752 

87 622 

628 

633 

639 

645 

651 

656 

662 

668 

674 



753 

87 679 

685 

691 

697 

703 

708 

714 

720 

726 

731 



754 

87 737 

743 

749 

754 

760 

766 

772 

777 

783 

789 



755 

87 795 

800 

806 

812 

818 

823 

829 

835 

841 

846 



756 

87 852 

858 

864 

869 

875 

881 

887 

892 

898 

904 



757 

87 910 

915 

921 

927 

933 

938 

944 

950 

955 

961 



-758 

87 967 

973 

978 

984 

990 

996 

=^00 1 

*007 

*013 

*018 



759 

88 024 

030 

036 

041 

047 

053 

058 

064 

070 

076 



760 

88 081 

087 

093 

098 

104 

no 

116 

121 

127 

133 



761 

88 138 

144 

150 

156 

161 

167 

173 

178 

184 

190 


6 

762 

88 19s 

201 

207 

213 

218 

224 

230 

235 

241 

247 

I 

0.6 

763 

88 252 

258 

264 

270 

275 

281 

287 

292 

298 

304 

2 

3 

1.2 

1.8 

764 

86 309 

315 

321 

326 

332 

338 

343 

349 

355 

360 

4 

2.4 

765 

88 366 

372 

377 

383 

389 

395 

400 

406 

412 

417 

5 

6 

3.0 

766 

88 423 

429 

434 

440 

446 

451 

457 

463 

468 

474 

7 

4.2 

767 

88 480 

485 

491 

497 

502 

508 

513 

519 

525 

530 

8 

4.8 

768 

88 536 

542 

547 

553 

559 

564 

570 

576 

581 

587 

9 

5-4 

769 

88 593 

598 

604 

610 

615 

621 

627 

632 

638 

643 



770 

88 649 

655 

660 

666 

672 

677 

683 

689 

694 

700 



771 

88 705 

711 

717 

722 

728 

734 

739 

745 

750 

756 



772 

88 762 

767 

773 

779 

784 

790 

795 

801 

807 

812 



773 

88 818 

824 

829 

835 

840 

846 

852 

857 

863 

868 



774 

88 874 

880 

885 

891 

897 

902 

908 

913 

919 

925 



775 

88 930 

936 

941 

947 

953 

958 

964 

969 

975 

981 



776 

88 986 

992 

997 

*003 

*009 

*014 

*020 

*025 

*031 

*037 



777 

89 042 

048 

053 

059 

064 

070 

076 

081 

087 

092 



778 

89 098 

104 

109 

115 

120 

126 

131 

137 

143 

148 



779 

89 154 

159 

165 

170 

176 

182 

187 

193 

198 

204 



780 

89 209 

215 

221 

226 

232 

237 

243 

248 

254 

260 


5 

781 

89 265 

271 

276 

282 

287 

293 

298 

304 

310 

315 

I 

0.5 

782 

89 321 

326 

332 

337 

343 

348 

354 

360 

365 

371 

2 

I.O 

783 

89 376 

382 

387 

393 

398 

404 

409 

415 

421 

426 

3 

4 

i.S 

2,0 

784 

89 432 

437 

443 

448 

454 

459 

465 

470 

476 

481 

S 

2.5 

785 

89487 

492 

498 

504 

509 

515 

520 

526 

531 

537 

6 

*7 

3.0 

3 -S 

4.0 

786 

89 542 

548 

553 

559 

564 

570 

575 

581 

586 

592 

{ 

8 

787 

89 597 

603 

609 

614 

620 

625 

631 

636 

.642 

647 

9 

4-5 

788 

89 653 

658 

664 

669 

675 

680 

686 

691 

697 

702 



789 

89 708 

713 

719 

724 

730 

735 

741 

746 

752 

757 



790 

89 763 

768 

774 

779 

785 

790 

796 

801 

807 

812 



791 

89 818 

823 

829 

834 

840 

845 

851 

856 

862 

867 



792 

89 873 

878 

883 

889 

894 

900 

905 

911 

916 

922 



793 

89 927 

933 

938 

944 

949 

955 

960 

966 

971 

977 



794 

89 982 

988 

993 

998 

*004 

*009 

*015 

*020 

*026 

*031 



795 

90 037 

042 

048 

053 

059 

064 

069 

075 

080 

086 



796 

90 091 

097 

102 

108 

113 

119 

124 

129 

135 

140 



797 

90 146 

151 

157 

162 

168 

173 

179 

184 

189 

195 



798 

90 200 

206 

211 

217 

222 

227 

233 

238 

244 

249 



799 

90255 

260 

266 

271 

276 

282 

287 

293 

298 

304 



800 

90 309 

314 

320 

325 

331 

336 

342 

347 

352 

358 

Prop. Parts 

N 

0 

1 

2 

3 

4 

5 

6 1 

7 

8 

9 


— 677 — 7500 — LOGARITHMS OF NUMBERS — 8009 


V. 8000 — LOGARITHMS OF NUMBERS — 8509 CFIVe- 


N 

0 

1 

2 

3 

4 

6 

6 

7 

8 

9 

Prop. Parts 

800 

90 309 

314 

320 

325 

331 

336 

’ 342 

347 

352 

358 



801 

90363 

369 

374 

380 

385 

390 

396 

401 

407 

412 



802 

90417 

423 

428 

434 

439 

445 

450 

455 

461 

466 



803 

90 472 

477 

482 

488 

493 

499 

504 

509 

515 

520 



804 

90 526 

531 

536 

542 

547 

553 

558 

563 

569 

574 



805 

90 580 

585 

590 

596 

601 

607 

612 

617 

623 

628 



806 

90634 

639 

644 

650 

655 

660 

666 

671 

677 

682 



807 

90 687 

693 

698 

703 

709 

714 

720 

725 

730 

736 



808 

90 741 

747 

752 

757 

763 

768 

773 

779 

784 

789 



809 

90 795 

800 

806 

811 

816 

822 

827 

832 

838 

843 



810 

90 849 

854 

859 

865 

870 

875 

881 

886 

891 

897 



811 

90 902 

907 

913 

918 

924 

929 

934 

940 

945 

950 



812 

90 956 

961 

966 

972 

977 

982 

988 

993 

998 

*004 


6 

813 

91 009 

014 

020 

025 

030 

036 

041 

046 

052 

057 

I 

0.6 

814 

91 062 

068 

073 

078 

084 

089 

094 

100 

105- 

no 

2 

3 

1.2 

1.8 

815 

91 116 

121 

126 

132 

137 

142 

148 

153 

158 

164 

4 

2.4 

816 

91 169 

174 

180 

185 

190 

196 

201 

206 

212 

217 

S 

5 

3.0 

-3 A 

817 

91 222 

228 

233 

238 

243 

249 

254 

259 

265 

270 

7 

3.0 

4.2 

818 

91 275 

281 

286 

291 

297 

302 

307 

312 

318. 

323 

$ 

4.8 

819 

91 328 

334 

339 

344 

350 

355 

360 

365 

371 

376 

9 

54 

820 

91 381 

387 

392 

397 

403 

408 

413 

418 

424 

429 



821 

91 434 

440 

445 

450 

455 

461 

466 

471 

477 

482 



822 

91 487 

492 

498 

503 

508 

514 

519 

524 

529 

535 



823 

91 540 

545 

551 

556 

561 

566 

572 

577 

582 

587 



824 

91 593 

598 

603 

609 

614 

619 

624 

630 

635 

640 



825 

91 645 

651 

656 

661 

666 

672 

677 

682 

687 

693 



826 

91 698 

703 

709 

714 

719 

724 

730 

735 

740 

745 



827 

91 751 

756 

761 

766 

772 

777 

782 

787 

793 

798 



828 

91 803 

808 

814 

819 

824 

829 

834 

840 

845 

850 



829 

91 855 

861 

866 

871 

876 

882 

887 

892 

897 

903 



830 

91 908 

913 

918 

924 

929 

934 

939 

944 

950 

955 



831 

91 960 

965 

971 

976 

981 

986 

991 

997 

*002 

*007 



832 

92 012 

018 

023 

028 

033 

038 

044 

049 

054 

059 



833 

92 065 

070 

075 

080 

085 

091 

096 

lOI 

106 

III 

1 

2 

o.S 

I.O 

834 

92 1 17 

122 

127 

132 

137 

143 

148 

153 

158 

163 

3 

i.S 

835 

92 169 

174 

179 

184 

189 

195 

200 

205 

210 

215 

4 

5 

2 0 

2.5 

836 

92 221 

226 

231 

236 

241 

247 

252 

257 

262 

267 

6 

3-0 

837 

92 273 

278 

283 

288 

293 

298 

304 

309 

314 

319 

7 

8 

3.5 

4-0 

838 

92 324 

330 

335 

340 

345 

350 

355 

361 

366 

371 

9 

; 4*5 

839 

92 376 

331 

387 

392 

397 

402 

407 

412 

418 

423 



840 

92 428 

433 

438 

443 

449 

454 

459 

464 

469 

474 



841 

92 480 

485 

490 

495 

500 

505 

511 

516 

521 

526 



842 

92 531 

536 

542 

547 

552 

557 

562 

567 

572 

578 



843 

92 583 

588 

593 

598 

603 

609 

614 

619 

624 

629 



844 

92 634 

639 

645 

650 

655 

660 

665 

670 

675 

681 



845 

92 686 

691 

696 

701 

706 

711 

716 

722 

727 

732 



846 

92 737 

742 

747 

752 

758 

763 

768 

773 

778 

783 



847 

92 788 

793 

799 

804 

809 

814 

819 

824 

829 

834 



848 

92 840 

845 

850 

855 

860 

865 

870 

875 

881 

886 



849 

92 891 

896 

901 

906 

911 

916 

921 

927 

932 

937 



860 

92 942 

947 

9‘'2 

957 

962 

967 

973 

978 

983 

988 

• 


N 

0 

1 

2 

3 

4 

6 

6 

7 

8 

9 

Prop. Parts 


8000 — LOGARITHMS OF NUMBERS — 8509 —678 — 



PLACE] V. 8500— LOGARITHMS OF NUMBERS — 9009 


Prop. Parts 

N 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 



850 

92 942 

947 

952 

957 

962 

967 

973 

978 

983 

988 



851 

92 993 

998 

*003 

*008 

*013 

*018 

*024 

*029 

*034 

*039 



852 

93 044 

049 

054 

059 

064 

069 

075 

080 

085 

090 



853 

93 095 

100 

105 

no 

115 

120 

125 

131 

136 

141 



854 

93 146 

151 

156 

161 

166 

171 

176 

181 

186 

192 



855 

93 197 

202 

207 

212 

217 

222 

227 

232 

237 

242 



856 

93 247 

252 

258 

263 

268 

273 

278 

283 

288 

293 


6 

857 

93 298 

303 

308 

313 

318 

323 

328 

334 

339 

344 

I 

o6 

858 

93 349 

354 

359 

364 

369 

374 

379 

384 

389 

394 

2 

I 8 

859 

93 399 

404 

409 

414 

420 

425 

430 

435 

440 

445 

4 

2.4 

860 

93 450 

455 

460 

465 

470 

475 

480 

485 

490 

495 

5 

3.0 

861 

93 500 

505 

510 

515 

520 

526 

531 

536 

541 

546 

7 

3.0 

4.2 

862 

93 551 

556 

561 

566 

571 

576 

581 

586 

591 

596 

8 

4.8 

863 

93 601 

606 

61 1 

616 

621 

626 

631 

636 

641 

646 

9 

5-4 

864 

93651 

656 

661 

666 

671 

676 

682 

687 

692 

697 ' 



865 

93 702 

707 

712 

717 

722 

727 

732 

737 

742 

747 : 



866 

93 752 

757 

762 

767 

772 

777 

782 

787 

792 

797 



867 

93 802 

807 

812 

817 

822 

827 

832 

837 

842 

847 ' 



868 

?3 852 

857 

862 

867 

872 

877 

882 

887 

892 

897 ' 



869 

93 902 

907 

912 

917 

922 

927 

932 

937 

942 

947 ! 



870 

93 952 

957 

962 

967 

972 

977 

982 

987 

992 

997 



871 

94 002 

007 

012 

017 

022 

027 

032 

037 

042 

047 ! 


5 

872 

94 052 

057 

062 

067 

072 

077 

082 

086 

091 

096 

1 

2 

o.S 

I.O 

873 

94 lOI 

106 

III 

116 

121 

126 

13X 

136 

141 

146 

3 

i.S 

874 

94 151 

156 

161 

166 

171 

176 

181 

186 

191 

196 ' 

4 

2.0 

875 

94 201 

206 

211 

216 

221 

226 

231 

236 

240 

245 

6 

3*0 

876 

94250 

255 

260 

265 

270 

275 

280 

285 

290 

295 

1 

8 

3-5 

4.0 

877 

94 300 

305 

310 

315 

320 

325 

330 

335 

340 

345 

9 

4.5 

878 

94 349 

354 

359 

364 

369 

374 

379 

384 

389 

394 



879 

94 399 

404 

409 

414 

419 

424 

429 

433 

438 

443 ■ 



880 

94448 

453 

458 

463 

468 

473 

478 

483 

488 

493 : 



881 

94498 

503 

507 

512 

517 

522 

527 

532 

537 

542 



882 

94 547 

552 

557 

562 

567 

571 

576 

581 

586 

591 



883 

94 596 

601 

606 

611 

616 

621 

626 

630 

635 

640 



884 

94645 

650 

655 

660 

665 

670 

675 

680 

685 

689 



885 

94694 

699 

704 

709 

714 

719 

724 

729 

734 

738 


4 

886 

94 743 

748 

753 

758 

763 

768 

773 

778 

783 

787 

I 

0.4 

887 

94 792 

797 

802 

807 

812 

817 

822 

827 

832 

836 

2 

0.8 

888 

94841 

846 

851 

856 

861 

866 

871 

876 

880 

885 

o 

4 

1.6 

889 

94 890 

895 

900 

905 

910 

915 

919 

924 

929 

934 

5 

2.0 

890 

94 939 

944 

949 

954 

959 

963 

968 

973 

978 

983 

6 

7 

2.4 

2.8 

891 

94 988 

993 

998 

*002 

*007 

*012 

*017 

*022 

*027 

*032 

$ 

3.2 

892 

95 036 

041 

046 

051 

056 

061 

066 

071 

075 

080 

9 

3.6 

893 

95 085 

090 

095 

100 

105 

109 

114 

119 

124 

129 



894 

95 134 

139 

143 

148 

153 

158 

163 

168 

173 

177 



895 

95 182 

187 

192 

197 

202 

207 

211 

216 

221 

226 



896 

95 231 

236 

240 

245 

250 

255 

260 

265 

270 

274 



897 

95 279 

284 

289 

294 

299 

303 

308 

313 

318 

323 



898 

95 328 

332 

337 

342 

347 

352 

357 

361 

366 

371 



899 

95376 

381 

386 

390 

395 

400 

405 

410 

415 

419 


• 

900 

95 424 

429 

434 

439 

444 

448 

453 

458 

463 

468 

Prop. 

Parts 

N 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 


— 679 — 8500 — LOGARITHMS OF NUMBERS — 9009 



V. 9000 — LOGARITHMS OF NUMBERS — 9S09 [five 


N 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

Prop 

. Parts 

900 

95 424 

429 

434 

439 

444 

448 

453 

458 

463 

468 



901 

95472 

477 

482 

487 

492 

497 

501 

506 

51X 

516 



902 

95 521 

525 

530 

535 

540 

545 

550 

554 

559 

564 



903 

95 569 

574 

578 

583 

588 

593 

598 

602 

607 

612 



: 904 

95617 

622 

626 

631 

636 

641 

646 

650 

655 

660 



905 

95 665 

670 

674 

679 

684 

689 

694 

698 

703 

708 



906 

957x3 

718 

722 

727 

732 

737 

742 

746 

75 X 

756 



907 

95 761 

766 

770 

775 

780 

785 

789 

794 

799 

804 



908 

95 809 

813 

818 

823 

828 

832 

837 

842 

847 

852 



909 

95 856 

86r 

866 

871 

875 

880 

885 

890 

895 

899 



910 

95 904 

909 

9x4 

018 

923 

928 

933 

938 

942 

947 



911 

95 952 

957 

961 

966 

971 

976 

980 

985 

990 

995 



912 

95 999 

*004 

*009 

*014 

*019 

*023 

*028 

*033 

*038 

*042 


5 

913 

96047 

052 

057 

061 

066 

071 

076 

080 

085 

090 

I 

0.5 

914 

96095 

099 

104 

109 

I14 

118 

123 

128 

133 

137 

3 

i.s 

915 

96 142 

X 47 

152 

156 

161 

166 

171 

X 75 

180 

185 

4 

2.0 

, 916 

96 190 

194 

199 

204 

209 

213 

218 

223 

227 

232 

5 

6 

2.5 

3-0 

917 

96 237 

242 

246 

251 

256 

261 

265 

270 

275 

280 

7 

3.5 

918 

96 284 

289 

294 

298 

303 

308 

3x3 

3x7 

322 

327 

8 

4.0 

919 

96 332 

336 

34 X 

346 

350 

355 

360 

365 

369 

374 


4*5 

920 

96 379 

384 

388 

393 

398 

402 

407 

412 

4x7 

421 



921 

96 426 

43 X 

435 

440 

445 

450 

454 

459 

464 

468 



922 

96473 

478 

483 

487 

492 

497 

501 

506 

51X 

5x5 



923 

96 520 

525 

530 

534 

539 

544 

548 

553 

558 

562 



924 

96 567 

572 

577 

581 

586 

59 X 

595 

600 

605 

609 



925 

96 614 

619 

624 

628 

633 

638 

642 

647 

652 

656 



926 

96 661 

666 

670 

675 

680 

685 

689 

694 

699 

703 



927 

96 708 

7x3 

717 

722 

727 

731 

736 

741 

745 

750 



928 

96 755 

: 759 

764 

769 

774 

778 

783 

788 

792 

797 



929 

96 802 

806 

8ir 

816 

820 

825 

830 

834 

839 

844 



930 

00 

00 

VO 

Ov 

853 

858 

862 

867 

872 

876 

881 

886 

890 



931 

96 895 

900 

904 

909 

914 

918 

923 

928 

932 

937 


A 

932 

96 942 

946 

951 

956 

960 

965 

970 

974 

979 

984 



933 

96 988 

993 

997 

*002 

*007 

*011 

*016 

*021 

*025 

*030 

1 

2 

0.4 

0.8 

934 

97035 

039 

044 

049 

053 

058 

063 

067 

072 

077 

3 

1.2 

935 

97 081 

086 

090 

095 

100 

104 

109 

114 

118 

X23 

4 

5 

1.0 

2.0 

936 

97 128 

132 

137 

142 

146 

X51 

155 

160 

165 

169 

6 

2.4 

937 

97 174 

179 

X83 

188 

192 

197 

202 

206 

211 

216 

7 

8 

2.8 

3.2 

938 

97 220 

225 

230 

234 

239 

243 

248 

253 

257 

262 

9 

3.6 

939 

97 267 

271 

276 

280 

285 

290 

294 

299 

304 

0 

00 



940 

97313 

3x7 

322 

327 

33 X 

336 

340 

345 

350 

354 



941 

97 359 

364 

368 

373 

377 

382 

387 

391 

396 

400 



942 

97405 

410 

414 

419 

424 

428 

433 

437 

442 

447 



943 

97 451 

456 

460 

465 

470 

474 

479 

483 

488 

493 



944 

97497 

502 

506 

5x1 

5x6 

520 

525 

529 

534 

539 



945 

97 543 

548 

552 

557 

562 

566 

571 

575 

580 

585 



946 

97 589 

594 

598 

603 

607 

612 

617 

621 

626 

630 



947 

97 635 

640 

644 

649 

653 

658 

663 

667 

672 

676 



948 

97 681 

685 

690 

695 

699 

704 

708 

713 

717 

722 



949 

97 727 

731 

736 

740 

745 

749 

754 

759 

763 

768 



960 

97 772 

777 

782 

786 

791 

795 

800 

804 

809 

813 



N 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

Prop. 

Parts 
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1 Prop. Parts 

N 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 



960 

97 772 

777 

782 

786 

791 

795 

800 

804 

809 

813 



951 

97 8i8 

823 

827 

832 

836 

841 

845 

850 

855 

859 



952 

97 864 

868 

873 

877 

882 

886 

891 

896 

900 

905 



953 

97 909 

914 

918 

923 

928 

932 

937 

941 

946 

950 



954 

97 955 

959 

964 

968 

973 

978 

982 

987 

991 

996 



955 

98 000 

005 

009 

014 

019 

023 

028 

032 

037 

041 



956 

98 046 

050 

055 

059 

064 

068 

073 

078 

082 

087 



957 

98 091 

096 

100 

105 

109 

114 

118 

123 

127 

132 



958 

98 137 

141 

146 

150 

155 

159 

164 

168 

173 

177 



959 

98 182 

186 

191 

195 

200 

204 

209 

214 

218 

223 



960 

98 227 

232 

236 

241 

245 

250 

254 

259 

263 

268 



961 

98 272 

277 

281 

286 

290 

295 

299 

304 

308 

313 


5 

962 

98318 

322 

327 

331 

336 

340 

345 

349 

354 

358 

I 

0.5 

963 

98 363 

367 

372 

376 

381 

385 

390 

394 

399 

403 

2 

3 

I.O 

1.5 

964 

98 408 

412 

417 

421 

426 

430 

435 

439 

444 

448 

4 

2.0 

965 

98453 

457 

462 

466 

471 

475 

480 

484 

489 

493 

S 

2.5 

966 

98 498 

502 

507 

511 

516 

520 

525 

529 

534 

538 

7 

3*0 

3.5 

967 

98 543 

547 

552 

556 

561 

565 

570 

574 

579 

583 

8 

4.0 

968 

98 588 

592 

597 

601 

605 

610 

614 

619 

623 

628 

9 

4-5 

969 

98 632 

637 

641 

646 

650 

655 

659 

664 

668 

673 



970 

98 677 

682 

686 

691 

695 

700 

704 

709 

713 

717 



971 

98 722 

726 

731 

735 

740 

744 

749 

753 

758 

762 



972 

98 767 

771 

776 

780 

784 

789 

793 

798 

802 

807 



973 

98 811 

816 

820 

825 

829 

834 

838 

843 

847 

851 



974 

98 856 

860 

865 

869 

874 

878 

883 

887 

892 

896 



975 

98 900 

905 

909 

914 

918 

923 

927 

932 

936 

941 



976 

98 945 

949 

954 

958 

963 

967 

972 

976 

981 

985 



977 

98 989 

994 

998 

*003 

*007 

*012 

*016 

*021 

*025 

*029 



978 

'H034 

038 

043 

047 

052 

056 

061 

065 

069 

074 



979 

99 078 

083 

087 

092 

096 

100 

105 

109 

114 

118 



980 

99 123 

127 

131 

136 

140 

145 

149 

154 

158 

162 


A 

981 

99 167 

171 

176 

180 

185 

189 

193 

198 

202 

207 


% 

982 

99 21 1 

216 

220 

224 

229 

233 

238 

242 

247 

251 

1 

2 

0.4 

0 8 

983 

99 255 

260 

264 

269 

273 

277 

282 

286 

291 

295 

3 

1.2 

984 

99 300 

304 

308 

313 

317 

322 

326 

330 

335 

339 

4 

5 

I *0 

2,0 

985 

99 344 

348 

352 

357 

361 

366 

370 

374 

379 

383 

6 

2 4 

986 

99 388 

392 

396 

401 

405 

410 

414 

419 

423 

427 

7 

8 

2.8 

3.2 

987 

99 432 

436 

441 

445 

449 

454 

458 

463 

467 

471 

9 

36 

988 

99 476 

480 

484 

489 

493 

498 

502 

506 

511 

515 



989 

99 520 

524 

528 

533 

537 

542 

546 

550 

555 

559 



990 

99 564 

568 

572 

577 

581 

58s 

590 

594 

599 

603 



991 

99 607 

612 

616 

621 

625 

629 

634 

638 

642 

647 



992 

99 651 

656 

660 

664 

669 

673 

677 

682 

686 

691 



993 

99 695 

699 

704 

708 

712 

717 

721 

726 

730 

734 



994 

99 739 

743 

747 

752 

756 

760 

765 

769 

774 

778 



995 

99 782 

787 

791 

795 

800 

804 

808 

813 

817 

822 



996 

99 826 

830 

835 

839 

843 

848 

852 

856 

861 

865 



997 

99 870 

874 

878 

883 

887 

891 

896 

900 

904 

909 



998 

99913 

917 

922 

926 

930 

935 

939 

944 

948 

952 



999 

i 99 957 

961 

965 

970 

974 

978 

983 

987 

991 

996 


• 

1000 

00 000 

004 

009 

013 

017 

022 

026 

0 

0 

0^5 

039 

Prop 

1. Parts 

N 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 
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Abscissa, 244, 634 , 182; use of, 183; characteristics 

Accuracy of estimate, 252, 339, 341, of, 183, 202, 342; distribution of, 

563 341, 343; standard error of, 341, 

Adding machine, use of, 12, 14 480; of small samples, 474 

Addition, algebraic, 12; fractions, 16; Arithmetic scale, 147, 168 
decimals, 20; ratios,- 46 Array, 95, 97 

Additive relationship, 417, 420, 606 Averages, 171, 172; simple, 174; 
Aggregative index numbers, 448, 451; methods of computation, 174, 177; 

simple, 448 arithmetic, 177, 179; weighted, 

Agriculture, index, 75, 76; weighted, 180, 185; geometric, 183, 186; har- 

450 monic, 189; median, 194; mode, 

Aliquot parts, 19 201; root-mean-square, 221 ; mov- 

Almack, E. B., 442 ing, 372, 408, 410; of price rela- 

Amplitude of cycle, 429; seasonal tives, 452 
variation, 396, 411 Average deviation, 212; methods of 

Analysis of covariance, 528, 530, 534, computation, 213; use, 213, 226 

541, 545, 547 {see mean deviation) 

Analysis of large samples, 171, 465, Axes, 141, 244; X-axis, 244, 635; 

466 Y-axis, 244, 635 

Analysis of small samples, 465; pur- 
pose, 467; experimental design, Babson, Roger W., 365 
468; Latin squares, 471; means, Band charts, 156, 158 
480; regression, 483; correlation. Bar chart, 150; horizontal, 150, 152; 
486; x^, 490 vertical, 152, 153; compound, 152, 

Analysis of variance, 499, 501, 511, 153; methods of making, 154 

514, 519; experimental error, 516 Base period, 443 
Anti-logarithm, 31, 33 Bennett, K. R., 606 

Arbitrary origin, 176, 218 {see as- Beta coefiGicients, 603 
sumed mean) Bias, 318; sampling, 318, 320; index 

Areas of the normal curve, 332, Ap- numbers, 442, 444 
pendix Table II Binomial equation, 30, 326, 328; dis- 

Arithmetic mean, 173; of percent- tribution, 30, 330; coefficients, 30, 
ages, 46, 47; unorganized data, 328; application to sampling, 326; 

174; long method, 176, 179; short error, 326; normal curve, 326, 330 

method, 177, 179; weighted, 180, Burns, Arthur F., 425 
182; averaging a group of means. Business cycles, 415, 417, 425; fore- 

683 
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INDEX 


casting, 361; deviation from nor- 
mal, 416, 419 

Calculating machines, use of, 12, 14 
Central tendency, 171, 172; meas- 
ures, 172 {see averages) 

Charts, 141; rules for making, 141, 
147, 151, 154; line, 142, 145; Z- 
charts, 144; percentage, 146, 149; 
bar, 150, 153; band, 156; pie, 158, 
159; Lorenz, 157; plus and minus, 
160; map, 161, 164; pictograph, 
165, 167 

Checking accuracy of data, 79 
Chi-square, 488, 496; computation 
of, 490, 492; uses of, 490, 492, 496; 
tables of, 491; test of goodness of 
fit, 492; test of normality, 492, 
493; test of homogeneity, 494 
Classification, 116; bases of, 116; 
time, 116; space, 117; quantity, 
117; quality, 117 

Class intervals, 100; rules for form- 
ing, 101; number, 101, 108; size, 
101, 108; equal, 103; unequal, 
103; concentration of data, 104; 
mid-point, 104; by formula, 105; 
by graphic method, 105 
Coding data, 89 

Coefficient of correlation, 264, 269; 
computation of r, 271, 273; Pear- 
sonian coefficient, 273; of un- 
grouped data, 275; for totals of 
original data, 277; correlation 
tables, 278, 2^; with class inter- 
vals, 279; rank differences, 283; 
limitations of, 284; standard error 
of, 344; tests of significance, 486; 
small samples, 486; computation 
of P, 556; computation of R, 584, 
588; of partial correlation, 598, 
600, 602; tables of relations to 
z functions. Appendix Table III 
Coefficient of determination, 265, 
269; simple, 265, 272; -relation to 
coefficient of correlation, 265, 272, 
284; use, 265, 271; multiple, 285, 
596, 604; percentage, 555; cur- 
vilinear, 555, 561 


Coefficient of variation, 225 
Coin tossing, 327, 329 
Collecting data, 69 
Common fractions, 16 
Compound interest, law of, 188, 574 
Constants, 2, 45, 622 
Continuous series, 94 
Controls in sampling procedure, 318; 
large samples, 319; small samples, 
468, 470 

Corn-hog cycle, 361 
Correction, 176, 251, 254, 503, 515, 
532, 568 

Correlation, 2§4, 269; uses, 264, 284; 
simple, 265; positive, 267; nega- 
tive, 267; linear, 269, 274; tabu- 
lar, 279; limitations, 284; standard 
errors of, 344, 486, 607 ; time 
series, 423, ,435; curvilinear, 556, 
604; partial, 598, 600, 602; mul- 
tiple, 604, 606; joint, 606, 607 
Critical ratio, 512 
Cumulative charts, 199 
Curve fitting, 232, 551; free hand, 
233, 552; class averages, 237, 552; 
least squares, 242, 246, 556; math- 
ematical, 242, 556; straight line, 
244, 249; time series, 374, 383; 
curvilinear, 552, 557, 563, 572, 575 
Curvilinear regression, 550, 553, 557, 
567, 575; simple, 384, 390, 557; 
relation to correlation, 570; to de- 
termination, 570; multiple, 606 
Cyclical fluctuations, 415, 435; re- 
sidual, 417, 423; methods of, 418, 
419; forecasting, 419, 424; meas- 
urement, 421, 423; direct, 425, 
427, 429, 434; amplitude, 426; 
length, 428 

Data, 1; primary, 61, 69; secondary, 
61, 69; coding, 89; continuous, 94; 
discrete, 94 
Decile, 197 

Degrees of freedom, 478, 502, t-tests, 
477, 481, 485 ; computation of error, 
479; in small samples, 480, 483; 
in correlation analysis, 486; for 
Chi-square, 488, 491; analysis of 
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, variance, 502, 519; analysis of 
covariance, 536, 539 
Dependent variable, 231, 244, 250, 
583, 639; symbols, 244, 583; mean- 
ing, 639 

Determination, 265; relation to co- 
efficient of correlation, 265, 284; 
coefficient of, 271; simple, 277; 
multiple, 584, 588, 596; separate, 
588, 598; standard error of, 589 
Deviation, 208, 226; range, 210; quar- 
tile, 211; average, 212; mean devi- 
ation, 212, 220; from ungrouped 
data, 212; computation of, 213, 
216; from mean, 214; standard 
deviation, 214, 218, 220; from 
class intervals, 216, 218; root- 
mean-square, 221 ; uses in correla- 
tion, 269, 273; standard error of 
standard deviation, 344; from mov- 
ing averages, 374, 377; from trends, 
386; in analysis of variance, 499, 
502, 510 

Differences first, 283; paired, 283 
Discrepancy, 341, 516, 517 
Discrete series, 94 

Dispersion, 208, 220; measures of, 
208, 220; range, 208; quartile de- 
viation, 211; mean deviation, 213; 
standard deviation, 214, 216; com- 
parison of measure, 219, 220 
Distribution, 93, 102; frequency, 93, 
328, 334; description of, 328, 337; 
normal curve, 331, 334 
Division, algebraic, 15; fractions, 18; 
decimals, 20 

Doolittle solution, 566, 568, 587, 593 

Economics, publications, 77 
Education, publications, 77 
Equations, 27, 29; binomial, 30, 245, 
326; regression, 246, 557, 563, 565; 
correlation, 271, 273, 276, 281, 284, 
588, 596; error, 343, 344, 346, 348, 
607; time series, 379, 388; loga- 
rithmic, 389, 572; exponential, 
574; compound interest, 574; 
Pearl-Reed, 575; logistic, 575 
Error, 341, 349, 516; sampling, 318, 


337, 516; normal curve, 328, 331, 
335; standard, 337; probable, 337, 
348; of mean, 343; standard de- 
viation, 344; coefficient of correla- 
tion, 344, 607 

Experimental design, 468, 470, 472 
Exponential curve, 574 
Exponents, 29 
Extrapolation, 392, 393 
Ezekiel, Mordecai, 386, 595, 605 

F-ratio, 502, 505, 515; variance ratio, 
502; F-table, 506, 509; covariance 
ratio, 543 

Federal publication, 73 
Free-hand curves, 233, 236, 239, 605 
Frequency curve, 331, 335; histo- 
gram, 106, 108; polygon, 110; 
normal, 223, 332; skewed, 223; 
equations for, 331, 333 
Frequency distribution, 93, 98; lo- 
cation of class intervals, 100; num- 
ber of classes, 100; class intervals, 
101; unequal classes, 101; size of 
classes, 101, 102; J-shaped, 111; 
characteristics of, 113; purpose of, 
113,175; multi-modal, 221; cumu- 
lative normal, 222, 331; skewed, 
222 

Frequency polygon, 110 
Frequency tables, 119, 134, 175; one- 
way tables, 123, 290; two-way 
tables, 124, 294; three-way tables, 
125, 297; four-way tables, 127; 
cumulative, 134, 137 

Gauss, Karl F., 326; normal law of 
error, 328, 331 

Geometric mean, 184, 186; defini- 
tion of, 183; characteristics of, 
184, 187; computation of, 184, 186; 
ungrouped data, 185; weighted, 
185; index numbers, use of loga- 
rithms, 185; from class intervals, 
186; uses of, 188; hmitations of, 
203 

Glossary of symbols. Appendix I, 
622 

Goodness of fit, 335; standard error 
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of estimate, 247, 596; criteria, 343, 
344, 346, Chi-square test, 488, 
492 (see Chi-square) 

Graphic methods, 141; locating aver- 
ages, 199; median, 199; quartiles, 
199; quintiles, 199; inode, 203 
Graphic presentation, 141 ; frequency 
distributions, 107, 111; line graphs, 
142, 145; natural scale, 143, 145; 
advantages of, 144, 146, 150, 168; 
z-chart, 144; ratio charts, 146, 
149; hmitations of, 151, 155 

Harmonic mean, 189; characteristics 
of, 190; computation of, 191, 193; 
from ungrouped data, 191; from 
class intervals, 191; use of recipro- 
cals, 191, 192 
Histogram, 106, 108 
Hog-corn ratio, 361 
Homogeneity, 488, 496; Chi-square 
tests, 488, 494; test of, 493, 495 
Hypothesis, null, 476 

Independent variable, 231, 244, 250, 
289, 583, 641; simple, 244; sym- 
bols, 244, 583; multiple, 639 
Index numbers, 438, 462; nature of, 
439, 441 ; use of, 441, 452, 459, 461 ; 
selecting, 442; base, 443, 458; type 
of averages, 443, 448, 452, 458, 461; 
weighted, 444, 450, 456; un- 
weighted, 444, 448, 452; weights, 
444, 445, 454; quantity weights, 
444, 450; value weights, 444, 456; 
tests for, 446, 461; aggregate, 448, 
450; average of price relatives, 
452, 455; shifting bases, 458; or 
production relatives, 459; socio- 
economic indexes, 460 
Index of variability, 225 
Induction, statistical, 258, 339; na- 
ture of, 259, 336, 340; limitations, 
259, 337, 340; large samples, 336, 
340; time series, 392; small sam- 
ples, 472, 475 

Interpolation, 201, 392; mode, 201, 
202; trend values, 392 
Irregular variations, time series, 362 


J-curve, 111 
Jerome, Harry, 336 
Joint correlation, 606 
Joint relationships, 606; tabular 
analysis, 287, 299, 311; analysis of 
variance, 502, 515, 521 

Kurtosis, 224 
Kuznetz, Simon, 425 

Lag, 641; time series, 361, 392, 397, 
435 

Latin square, 470, 472, 544 
Law of large jiumbers, 316 
Least-squares, 242, 245, 254, 378 
Likelihood, criterion of, see probabil- 
ity 

Line charts, 141, 143, 146 
Linear correction, see correlation 
Link relatives, 402 
Lively, C. E., 442 
Logarithmic regression lines, 572 
Logarithms, 30; characteristic, 31; 
common, 31; mantissa, 31; mul- 
tiplication, 32; e or Naperian, 331, 
642; Appendix Table V 
Logistic curve, 575, 578 
Log-lines, 389, 390 
Lorenz curve, 157 

Mantissa, 31, 33; Appendix Table V 
Map, 161, 164 

Mean, 173, 183; arithmetic, 173; 
definition of, 173, 180, 183; from 
ungrouped data, 174; from class 
intervals, 176, 177; long method, 
176; short method, 177; charac- 
teristics of, 183, 203; algebraic 
treatment of, 183, 203; effect of 
extreme values on, 203; use of, 203; 
quadratic, 221; standard error of, 
431; reliability, 481 
Mean deviation, 212, 220 
Mean, geometric, 183, definition, 183; 
computation, 184, 186; use of, 184, 
188; from individual items, 185; 
weighted, 185; from class inter- 
vals, 186 

Mean, harmonic, 189; definition, 
190; computation, 191, 192; from 
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individual items, 191; weighted, 
191; use of, 204 

Median, 194; definition, 194; from 
formulas, 195; graphic determina- 
tion, 199; relation to mean, 203; 
characteristics of, 204 
Methods, of research, 9, 57, 60, 606; 

of statistics, 6 
Mid-points of classes, 104 
Mills, Frederick C., 606 
Mises, Richard von, 322, 644 
Mitchell, W. C., 425 
Mixed numbers, 18 
Mode, 200; definition, 200; un- 
grouped data, 200; class intervals, 
201; interpolation formula, 201; 
location of, 201; computation of, 
201, 202; locational formula, 202; 
graphic method, 203; characteris- 
tics, 204; use of, 204 
Monthly trend values, 381, 382, 
388 

Moody’s, 365 

Moving average, 372, 408, 410; 
methods of computation, 372, 374; 
definition, 373, 376; uses of, 373, 
377; as trends, 373, 377; in cycles, 
374; characteristics of, 375; as 
basis of seasonal variation, 405 
Multiple correlation, 581, 585, 590, 
604; meaning of, 581, 606; co- 
efficient of, 583; mathematical, 

583, 585; relations to net regres- 
sion equations, 583; straight line, 

584, 588; curvilinear, 584, 604; 

" limitations of, 604, 606; uses of, 

606; standard error of, 607 
Multiplication, algebraic, 15; frac- 
tions, 17; decimals, 20 

Naperian logarithms, 331 
National Bureau of Economic Re- 
search, 425 

‘Natural numbers. Tables of squares. 
Appendix Table IV; square roots. 
Appendix Table IV; reciprocals. 
Appendix Table IV 
Net-regression coefiacients, 583, 594 
Non-government indexes, 75 


Non-linear correlation, 555, 561, 581, 
606 

Non-linear regression, 550, 554, 557, 
563, 574 

Normal curve, 328, 331; binomial, 
328; equation, 331, 334 
Normal deviate, 345 
Normal equations, 246, 249; deviar- 
tion of, 247; use of, 246, 249 
Normal law of error, 328, 331, 336, 
337, 341 

Normal probability curve, 331, 332, 
335 

Normal values, 420, 423 
Null-hypothesis, 476; t-tables, 477; 
definition, 478; meaning, 478; to 
small samples, 479; relation to re- 
search, 481; correlation, 486; anal- 
ysis of variance, 503 

Ordinate, 244, 634 

Organization of data, 57, 60, 94, 116: 
array, 95, 97; tally sheets, 98, 102; 
class intervals, 101, 105; bases of, 
116; tabulation, 116; tables, 287 
Organization table, 118, 129, 289. 
294; uses, 116, 287; types, 118, 
290, 294, 297; construction, 119, 
123, 125; definition, 289 
Origin, 176, 178, 244; arbitrary, 177; 
point of, 178, 244 

Parabola, 359, 387, 563; simple, 359, 
387, 556; equations, 385, 557, 565; 
uses, 550, 570, 571; cubic, 563 
Parameter, 321 

Partial correlation, 598; definition, 
598; equations, 600; uses, 601, 
602 

Peakedness, 224 (see Kurtosis) 
Pearson, F. A., 606 
Pearson, Karl, 273 
Percentages, 22, 37 
Percentile, 197 
Pictographs, 165, 167 
Pie charts, 159 
Powers, 29 

Price relative, 440, 452 
Primary data, 61, 69 
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Probability, 321, 322, 325; principle 
of, 322, 327; additions of, 324; 
joint, 325; measurement of, 326, 
328; normal, 330, 332, 334 
Probability curve, 327, 328, 331, 333; 
equations, 328, 331; use of, 336; 
physical data, 336; biological data, 
336; social data, 338 
Probable error, 341, 348 
Projection of trend, 392 
Purposive selection in sampling, 320; 
procedure, 317, 320 • 

Quantitative, 57, 60 
Quantity, 117, 118 
Quartile, 196; graphic presentation, 
199; deviation, 211, 226; use of, 
211, 222 

Questionnaire, 86; to whom to send, 
87 

Quintile, 197 

Random sampling, 315, 317, 320 
Range of variation, 210, 220, 226; 
semi-quartile, 211; coefficient of, 
224 

Rank correlation, 283 
Rate of interest, 188, 574 
Ratio chart, 146, 14^ 168 
Ratios, 37; bases of, 38; like items, 
42; unlike units, 43; averaging, 
46; popular ratios, 47, 49, 51 
Reciprocals, 22; Appendix Table IV 
Regression, 229, 550; definition, 230, 
245; lines, 232, 244, 369, 392, 550, 
553, 557, 563, 572, 575; computa- 
tion, 232, 236, 552, 558, 563; 
simple, 233, 244, 379; free-hand, 
233, 236; through class averages, 
237, 552; mathematical, 242, 246; 
straight, 244, 254; standard error 
of estimate, 247, 252, 256, 554, 
562; use of, 258, 260, 270; mul- 
tiple, 550, 583, 584; curvilinear, 
552, 557, 563, 568, 572, 576, 578; 
net, 584, 594 
Relative price, 440, 449 
Reliabihty, 341, 349, 486; coefficients 
of regression, 247, 483; standard 
deviation, 334; means, 341; meas- 


ures of, 341, 348, 483; coefficients 
of correlation, 344, 347, 486; dif- 
ferences between means, 348, 479 
Residuals, 248, 385 
Root-mean-square average, 221; de- 
viation {see standard deviation) 
Rounding numbers, 23 

Sample, size of, 321, 349, 466; ran- 
dom, 200, 317; stratified, 320; 
purposive, 320; reliability of, 476 
Sampling, 314, 349; theory of, 315, 
317; random, 317, 320; large, 317, 
321; problems of, 318, 320; pur- 
posive, 320; stratified, 320; errors 
of, 342, 344, 349; small, 465, 467, 
468, 470, 475 

Scales, 141, 144, 146, 149, 152, 154, 
163 

Scatter, 208, 256; degree of, 210, 212, 
220; measure of, 211, 216, 247 
Scatter diagram, 235, 243, 256 
Schedules, 82; testing, 86; editing, 88 
Seasonal variation, 395; amplitude, 
396; methods, 398, 400, 401, 402, 
405, moving average, 406, 410 
Secondary data, 61, 69 
Secular trend, 369; . class averages, 
236 ;‘ least squares, 242, 378; meas- 
urement of, 370; free-hand, 370; 
moving averages, 372, 374; monthly 
trends, 381, 382; parabolas, 384 
Semi-inter-quartile range, 211, 226 
Semi-logarithmic charts, 146, 149; 
construction, 141, 147, 168; use, 
145, 148, 168; ratio charts, 146 
Separate determination, 588, 598, 604 
Series, 93, 352; time, 58, 352; spatial, 
58, 352; discrete, 94; continuous, 94 
Significance, 479, 488, 491, 505; mean- 
ing of, 480, 505; test of, 481, 506, 
509; 5-percent, 481, 514; 1-per- 
cent, 481, 514; highly significant, 
481 

Significant figures, 24 
Skewness, 222; definition, 222; meas- 
ure, 222, 223 

Slope of regression line, 239, 243, 244, 
247 
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^mall samples, 465, 493; purpose of, 
467; how obtained, 468; experi- 
mental design, 469; Latin square, 
470; error of, 472, 475 
Smithsonian Miscellaneous Collec- 
‘ tions, 620 . 

Snedecor, George 427., 506 
Sociology, publications, 77 
Sources of data, 71, 73, 75, 78, 361, 
365 

Spatial series, 352 

Square root, 25; table of, Appendix 
Table IV 

Standard and Poor’s, 306 * 

Standard deviation, 214, 226; com- 
putation, 215, 218; ungrouped 
data, 215; uses, 215, 220, 225; 
long method, 216; class intervals, 
^216; short method, ^218; charac- 
teristics of, 219, 226; correlation, 
226; error, 344; variance, 499, 501 
Standard error, 337; binomial,^ 328, 
331; distribution, 332, “337;. of 
mean, 342; standard deviation, 
344; coefficient of correlation, 344; 
of z-f unctions, 346; differences of 
means, 348; regression lines, * b, 
483,485 

Standard error of estimate, 248, 258, 
273, 386; significance of, 247, 255, 
258, 260; computation, 248, 253, 
273; short cut methods, 252, 255; 
zone of estimate, 256; time series, 
386; parabola, 562, 569; multiple 
for straight line regression, 589, 596 
Statistic, 8, 321 
Statistical methods, 6 
Statistical problem, organization of,^ 
57, 60, 63, 64, 65 

Statistical reports, 610, 620; scien- 
tific, 611, 612, 620; business, 613, 
617, 621; popular, 619, 621 
^ Statistical theory, 6 i 

’ Statistical unit, '58, 59 
Statistics induction, 258, 259, 339 
Stratified sample, 319 
Subtraction, algpbraic, 13; fractions, 
17; decimals, 20 

Symbols, glossary mf. Appendix I 


Tables, 116, 287; cross classification 
•tables, complete, 123, 127; incom- 
plete, 128, 131; percentage, 132; 
cumulative, 135; t-table, 477; Chi- 
square, 491; F-tables, 506-509; 
ordinates of normal curve. Appen- 
dix Table I; area under normal 
curve, Appendix Table II; z-table, 
Appendix Table III; powers of 
natural numbers. Appendix Table 
IV; square root tables. Appendix 
Table IV; reciprocals, Appendix 
Table IV; logarithms. Appendix 
Table V 

Tabular analysis, 287, 312; data 
tables, 287, 301; organization 
tables, 289, 294, 297; summary 
tables, 291, 295, 300, 308, num- 
ber of class intervals, 298, 299, 302; 
uniform classes, 303; computation 
of error, 311; advantages, 311; 
disadvantages, 311 
Tabulation of data, 96, 99, 117, 287; 

. array, 95; tally sheet, 98, 102; 
class intervals, 100, T04; bases of 
classification, 116, 298, 303; types 
of tables, 118; stubs, 119; cap- 
tions, 119; positions of emphasis, 
120 

Tally sheet, 98, 102 
Time series analysis, 352; complexity 
of, 353 removal of trend, 355, 
369, 381; removal of seasonal vari- 
» ation, 360, 395; removal of cycles, 
361; irregular changes, 362 ^ 

trends, 230, 233, 237, 242, 244, 246, 
349 , 372, 378, 381, 392 

Ungrouped data, 96, 174 

United States Census data, 71, 

73 *^ 

United States government indexes of 
data, 71 

-Unweighted averages, 180, 444; in- 
dexes, 448, 452 
Use of libraries, 70 

Validity of data, testing, 80 
Variability, measure of, 225 
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Variable, 231 x-axis, 244, 635 

Variance, analysis of, 499, 502, 517 

y-axis, 244, 635 

Weighted averages, 180, 182, 185, 

191; indexes, 448, 452 z-test of correlation coefficients, 346; 

Worksheets, 96, 133, 177 Appendix Table III. 


